Driving libzip with CFFI

Published , updated

There is a huge number of C libraries that provide functionality that would be very useful in an application. One way to access this functionality from Tcl is to write a binding using Tcl's C API. This requires fairly detailed knowledge of both C as well as Tcl's API. An alternative is to use a Foreign Function Interface extension for Tcl such as cffi or Ffidl.

This post shows the use of the CFFI package to make use of libzip archiving library. It is not intended to be a step-by-step tutorial but rather promotional material for using the CFFI package (yes, even open source needs to be promoted). At the same time, it is not meant for throwaway code where you can get away with fast and loose practices in terms of error checking etc. Rather it illustrates some of the considerations for implementing a complete package based on CFFI.

We start with the usual boilerplate loading packages and setting the namespace path to save typing.

package require cffi
namespace path ::cffi

Loading a library

The next step is load the shared libzip library by creating a wrapper for it. We are not specifying the full path here even though that is recommended as the location will be system dependent. On Unix/Linux platforms, the library can be installed using the system package manager. On Windows you will need to download binaries from https://www.libzip.org or build the library yourself.

dyncall::Library create libzip libzip.so

By convention, the wrapper object is named after the shared library.

Declaring functions

The first function we will need is one to open a ZIP archive. This has the following C prototype

zip_t *zip_open(const char *path, int flags, int *errorp);

Accordingly we define a cffi function. As a first step, directly translating the above prototype, we get

libzip function zip_open pointer.zip_t {
    path string
    flags int
    err {int out}

Note we have not defined zip_t and as long as we use it as an opaque value with the library, we do not need to. Here it merely acts as a tag to catch type errors when pointers are passed around.

Dealing with errors

Now if we lived in a perfect world where errors were impossible that declaration would suffice. However, since we sadly do not, the first thing we need to do is ask how the wrapped function calls would behave in case of errors. For starters, wrapped functions generate error messages if incorrectly called just like normal Tcl commands.

% zip_open
Syntax: zip_open path flags err

What about errors from the function itself, for example if the ZIP archive does not exist?

% set zipper [zip_open nosuchfile.zip 0 err]
% set err

As per the documentation, we see that a NULL pointer is returned. So we could write code like this

if {[pointer isnull $zipper]} {
    ...error handling

But that is really not the Tcl way. Consider how the open command generates an exception in similar circumstances. We can achieve similar behavior here by adding error checking annotations to our declaration. This now becomes

libzip function zip_open {pointer.zip_t nonzero} {
    path string
    flags int
    err {int out storealways}

The nonzero annotation on the function return value declaration mandates that a return value of 0 (NULL for pointers) should be treated as an error and an exception raised. The storealways annotation on the err parameter is a little more subtle. Normally, CFFI will not store output values on error returns since called functions may not initialize output parameters at all in the presence of errors. This is unimportant for scalars like in this example, but crucial when structs and strings are involved. To receive the error code then, even when an error is seen, the storealways annotation is applied. (We could have also used the storeonerror annotation which would store it only on error and not on a successful return.)

We now get an exception as desired with err set to the error code.

% zip_open nosuchfile.zip 0 err
Invalid value "0x0000000000000000^". Function returned NULL pointer.
% set err

Declaring structs

The error code is meaningless to the user and since we are writing a "production quality" package, we should make use of the library's error description facilities. The error interfaces in libzip are a teeny bit more involved than expected. At the C level, the following definitions are required.

struct zip_error {
    int zip_err; /* libzip error code (ZIP_ER_*) */
    int sys_err; /* copy of errno (E*) or zlib error code */
    char *str;   /* string representation or NULL */
typedef struct zip_error zip_error_t;
void zip_error_init_with_code(zip_error_t *error, int ze);
const char *
zip_error_strerror(zip_error_t *ze);

We have to first initialize a zip_error_t struct by calling zip_error_init_with_code and then extract an error string from it via zip_strerror. Note the expectation is that the str field of the struct is not to be directly accessed as it might not be filled until the call to zip_strerror.

Our corresponding CFFI declarations read

Struct create zip_error_t {
    zip_err int
    sys_err int
    str     {pointer unsafe}
libzip functions {
    zip_error_init_with_code void {
        error {struct.zip_error_t out}
        ze    int
    zip_error_strerror string {ze {struct.zip_error_t byref}}
proc raise_zip_error {err} {
    zip_error_init_with_code zip_err $err
    error [zip_error_strerror $zip_err]

The above wraps the two liner for error translation into a proc raise_zip_error, since we likely need this error translation often. Our code can then be written as below to provide more helpful information.

% if {[catch {zip_open nosuchfile.zip 0 err} zipper]} {
    raise_zip_error $err
No such file
% if {[catch {zip_open text.txt 0 err} zipper]} {
    raise_zip_error $err
Not a zip archive

Listing files

But we are not writing an extension just to print errors so we can move on to doing "useful" work, say list the files in an archive. From the documentation we see two functions are needed for this purpose.

longlong zip_get_num_entries(zip_t *archive, int flags);
const char *zip_get_name(zip_t *archive, ulonglong index, uint flags);

The corresponding CFFI definitions would be

libzip function zip_get_num_entries longlong {archive pointer.zip_t}
libzip function zip_get_name string.utf-8 {
    archive pointer.zip_t
    index ulonglong
    flags {int {default 0}}

The function zip_get_name returns strings encoded in UTF-8 and hence the utf-8 encoding is specified on the string return value. CFFI will then automatically do the needed encoding transformation into Tcl's string values. The flags parameter is given a default so it does not have to be specified on every call.

There is a more subtle but very important point to be noted in the declaration of zip_get_name. The C function is declared as returning a char*. Our CFFI declaration causes it to be returned as a string which is convenient but means that the original returned pointer is lost. This would be a problem if the pointer is needed for freeing memory etc. However, because as per the libzip API this pointer needs no other handling (as long the zip file handle stays open) we can conveniently use the string return type. Otherwise we would have had to declare it as a pointer and explicitly extract the string.

Listing files in an archive is straightforward.

% set zipper [zip_open test.zip 0 err]
% set nfiles [zip_get_num_entries $zipper]
% for {set i 0} {$i < $nfiles} {incr i} {
    puts [zip_get_name $zipper $i]

Pointer safety

Our final task is to close the archive. The interface is simple but there is an important point to be noted regarding pointer safety. By default, pointers returned from functions are registered as valid or safe pointers. We can list these with pointer list.

% pointer list

Correspondingly, CFFI needs to know when a pointer is no longer valid. This is communicated through the dispose annotation on a pointer parameter on the CFFI declaration below (you can guess the corresponding C prototype).

libzip function zip_close int {archive {pointer.zip_t dispose}}

Calling the function will mark the pointer as invalid preventing further use.

% libzip function zip_close int {archive {pointer.zip_t dispose}}
% zip_close $zipper
% pointer list
% zip_close $zipper
Pointer 0x00005599526ff590^zip_t is not registered.

The list of valid pointers now shows empty and a second attempt to close it fails.

Final words

Hopefully this post has shown accessing at least some C libraries with an FFI extension is significantly simpler than writing a binding in C. Having said that though, it is worth keeping in mind a "professional quality" Tcl package based on a library needs

  • An understanding of the library's API and usage idioms
  • A binding to access the functionality from Tcl
  • Documentation
  • Test suite
  • Packaging

A FFI only helps with the second of these. Nevertheless, that in itself saves significant effort and facilitates incremental development and ease of experimentation.