Driving libzip with CFFI
NOTE: This post is a little dated, as it does not make use of features in newer releases.
There is a huge number of C libraries that provide functionality that would be very useful in an application. One way to access this functionality from Tcl is to write a binding using Tcl's C API. This requires fairly detailed knowledge of both C as well as Tcl's API. An alternative is to use a Foreign Function Interface extension for Tcl such as cffi or Ffidl.
This post shows the use of the CFFI package to make use of libzip
archiving library. It is not intended to be a step-by-step tutorial but rather promotional material for using the CFFI package (yes, even open source needs to be promoted). At the same time, it is not meant for throwaway code where you can get away with fast and loose practices in terms of error checking etc. Rather it illustrates some of the considerations for implementing a complete package based on CFFI.
We start with the usual boilerplate loading packages and setting the namespace path to save typing.
package require cffi
namespace path ::cffi
Loading a library
The next step is load the shared libzip
library by creating a wrapper for it. We are not specifying the full path here even though that is recommended as the location will be system dependent. On Unix/Linux platforms, the library can be installed using the system package manager. On Windows you will need to download binaries from https://www.libzip.org or build the library yourself.
dyncall::Library create libzip libzip.so
By convention, the wrapper object is named after the shared library.
Declaring functions
The first function we will need is one to open a ZIP archive. This has the following C prototype
zip_t *zip_open(const char *path, int flags, int *errorp);
Accordingly we define a cffi function. As a first step, directly translating the above prototype, we get
libzip function zip_open pointer.zip_t {
path string
flags int
err {int out}
}
Note we have not defined zip_t
and as long as we use it as an opaque value with the library, we do not need to. Here it merely acts as a tag to catch type errors when pointers are passed around.
Dealing with errors
Now if we lived in a perfect world where errors were impossible that declaration would suffice. However, since we sadly do not, the first thing we need to do is ask how the wrapped function calls would behave in case of errors. For starters, wrapped functions generate error messages if incorrectly called just like normal Tcl commands.
% zip_open
Syntax: zip_open path flags err
What about errors from the function itself, for example if the ZIP archive does not exist?
% set zipper [zip_open nosuchfile.zip 0 err]
0x0000000000000000^
% set err
9
As per the documentation, we see that a NULL pointer is returned. So we could write code like this
if {[pointer isnull $zipper]} {
...error handling
}
But that is really not the Tcl way. Consider how the open
command generates an exception in similar circumstances. We can achieve similar behavior here by adding error checking annotations to our declaration. This now becomes
libzip function zip_open {pointer.zip_t nonzero} {
path string
flags int
err {int out storealways}
}
The nonzero
annotation on the function return value declaration mandates that a return value of 0 (NULL for pointers) should be treated as an error and an exception raised. The storealways
annotation on the err
parameter is a little more subtle. Normally, CFFI will not store output values on error returns since called functions may not initialize output parameters at all in the presence of errors. This is unimportant for scalars like in this example, but crucial when structs and strings are involved. To receive the error code then, even when an error is seen, the storealways
annotation is applied. (We could have also used the storeonerror
annotation which would store it only on error and not on a successful return.)
We now get an exception as desired with err
set to the error code.
% zip_open nosuchfile.zip 0 err
Invalid value "0x0000000000000000^". Function returned NULL pointer.
% set err
9
Declaring structs
The error code is meaningless to the user and since we are writing a "production quality" package, we should make use of the library's error description facilities. The error interfaces in libzip
are a teeny bit more involved than expected. At the C level, the following definitions are required.
struct zip_error {
int zip_err; /* libzip error code (ZIP_ER_*) */
int sys_err; /* copy of errno (E*) or zlib error code */
char *str; /* string representation or NULL */
};
typedef struct zip_error zip_error_t;
void zip_error_init_with_code(zip_error_t *error, int ze);
const char *
zip_error_strerror(zip_error_t *ze);
We have to first initialize a zip_error_t
struct by calling zip_error_init_with_code
and then extract an error string from it via zip_strerror
. Note the expectation is that the str
field of the struct is not to be directly accessed as it might not be filled until the call to zip_strerror
.
Our corresponding CFFI declarations read
Struct create zip_error_t {
zip_err int
sys_err int
str {pointer unsafe}
}
libzip functions {
zip_error_init_with_code void {
error {struct.zip_error_t out}
ze int
}
zip_error_strerror string {ze {struct.zip_error_t byref}}
}
proc raise_zip_error {err} {
zip_error_init_with_code zip_err $err
error [zip_error_strerror $zip_err]
}
The above wraps the two liner for error translation into a proc raise_zip_error
, since we likely need this error translation often. Our code can then be written as below to provide more helpful information.
% if {[catch {zip_open nosuchfile.zip 0 err} zipper]} {
raise_zip_error $err
}
No such file
% if {[catch {zip_open text.txt 0 err} zipper]} {
raise_zip_error $err
}
Not a zip archive
Listing files
But we are not writing an extension just to print errors so we can move on to doing "useful" work, say list the files in an archive. From the documentation we see two functions are needed for this purpose.
longlong zip_get_num_entries(zip_t *archive, int flags);
const char *zip_get_name(zip_t *archive, ulonglong index, uint flags);
The corresponding CFFI definitions would be
libzip function zip_get_num_entries longlong {archive pointer.zip_t}
libzip function zip_get_name string.utf-8 {
archive pointer.zip_t
index ulonglong
flags {int {default 0}}
}
The function zip_get_name
returns strings encoded in UTF-8 and hence the utf-8
encoding is specified on the string
return value. CFFI will then automatically do the needed encoding transformation into Tcl's string values. The flags
parameter is given a default so it does not have to be specified on every call.
There is a more subtle but very important point to be noted in the declaration of zip_get_name
. The C function is declared as returning a char*
. Our CFFI declaration causes it to be returned as a string which is convenient but means that the original returned pointer is lost. This would be a problem if the pointer is needed for freeing memory etc. However, because as per the libzip
API this pointer needs no other handling (as long the zip file handle stays open) we can conveniently use the string
return type. Otherwise we would have had to declare it as a pointer
and explicitly extract the string.
Listing files in an archive is straightforward.
% set zipper [zip_open test.zip 0 err]
0x00005599526ff590^zip_t
% set nfiles [zip_get_num_entries $zipper]
2
% for {set i 0} {$i < $nfiles} {incr i} {
puts [zip_get_name $zipper $i]
}
foo.txt
bar.txt
Pointer safety
Our final task is to close the archive. The interface is simple but there is an important point to be noted regarding pointer safety. By default, pointers returned from functions are registered as valid or safe pointers. We can list these with pointer list
.
% pointer list
0x00005599526ff590^zip_t
Correspondingly, CFFI needs to know when a pointer is no longer valid. This is communicated through the dispose
annotation on a pointer parameter on the CFFI declaration below (you can guess the corresponding C prototype).
libzip function zip_close int {archive {pointer.zip_t dispose}}
Calling the function will mark the pointer as invalid preventing further use.
% libzip function zip_close int {archive {pointer.zip_t dispose}}
% zip_close $zipper
0
% pointer list
% zip_close $zipper
Pointer 0x00005599526ff590^zip_t is not registered.
The list of valid pointers now shows empty and a second attempt to close it fails.
Final words
Hopefully this post has shown accessing at least some C libraries with an FFI extension is significantly simpler than writing a binding in C. Having said that though, it is worth keeping in mind a "professional quality" Tcl package based on a library needs
- An understanding of the library's API and usage idioms
- A binding to access the functionality from Tcl
- Documentation
- Test suite
- Packaging
A FFI only helps with the second of these. Nevertheless, that in itself saves significant effort and facilitates incremental development and ease of experimentation.
More complete examples of writing packages using CFFI are in the repository.