Outdated egg!

This is an egg for CHICKEN 4, the unsupported old release. You're almost certainly looking for the CHICKEN 5 version of this egg, if it exists.

If it does not exist, there may be equivalent functionality provided by another egg; have a look at the egg index. Otherwise, please consider porting this egg to the current version of CHICKEN.

  1. Outdated egg!
  2. bind
    1. Usage
    2. Requirements
    3. Documentation
    4. General operation
    5. Syntactic forms
      1. bind
      2. bind*
      3. bind-type
      4. bind-opaque-type
      5. bind-file
      6. bind-file*
      7. bind-rename
      8. bind-rename/pattern
      9. bind-options
        1. export-constants
        2. class-finalizers
        3. mutable-fields
        4. constructor-name
        5. destructor-name
        6. exception-handler
        7. full-specialization
        8. prefix
        9. default-renaming
        10. foreign-transform
      10. bind-include-path
    6. Grammar
    7. C notes
    8. C++ notes
      1. constructor
      2. destructor
        1. new
        2. delete
  3. Authors
  4. License
  5. Version History

bind

Generates wrappers from C/C++ source code.

Usage

 (require-extension bind)

Requirements

Documentation

This extension provides a parser for a restricted subset of C and C++ that allows the easy generation of foreign variable declarations, procedure bindings and C++ class wrappers. The parser is invoked via the bind form, which extracts binding information and generates the necessary code. An example:

(bind "double sin(double);")

(print (sin 3.14))

The parser would generate code that is equivalent to

 (define sin (foreign-lambda double "sin" double))

Another example, here using C++. Consider the following class:

// file: foo.h

class Foo {
 private:
  int x_;
 public:
  Foo(int x);
  void setX(int x);
  int getX();
};

To generate a wrapper class that provides generic functions for the constructor and the setX and getX methods, we can use the following class definition:

; file: test-foo.scm
(require-extension bind coops cplusplus-object)

(bind* "#include \"Foo.h\"")

(define x (new <Foo> 99))
(print (getX x))              ; prints ''99''
(setX x 42)
(print (getX x))              ; prints ''42''
(delete x)

Provided the file foo.o contains the implementation of the class Foo, the given example could be compiled like this (assuming a UNIX like environment):

% csc test-foo.scm foo.o -c++

To use the C++ interface, the coops extension is needed. Additionally, the class <c++-object> must be available, which is provided in the cplusplus-object extension.

To use this facility, you can either use the syntactic forms provided by the bind extension or run the chicken-bind standalone program to process a C/C++ file and generate a file containing wrapper code which then can be compiled with the CHICKEN compiler.

As a debugging aid, you can pass -debug F to the Scheme compiler to see the generated wrapper code.

chicken-bind accepts a number of command-line options to enable/disable various options, enter

 % chicken-bind -help

for more information.

General operation

The parser will generally perform the following functions:

Basic token-substitution of macros defined via #define is performed. The preprocessor commands #ifdef, #ifndef, #else, #endif, #undef and #error are handled. The preprocessor commands #if and #elif are not supported and will signal an error when encountered by the parser, because C expressions (even if constant) are not parsed. The preprocessor command #pragma is allowed but will be ignored.

During processing of C code, the macro CHICKEN is defined (similar to the C compiler option -DCHICKEN).

Macro- and type-definitions are available in subsequent bind declarations. C variables declared generate a procedure with zero or one argument with the same name as the variable. When called with no arguments, the procedure returns the current value of the variable. When called with an argument, then the variable is set to the value of that argument. C and C++ style comments are supported. Variables declared as const will generate normal Scheme variables, bound to the initial value of the variable.

Function-, member-function and constructor/destructor definitions may be preceded by the ___safe qualifier, which marks the function as (possibly) performing a callback into Scheme. If a wrapped function calls back into Scheme code, and ___safe has not been given very strange and hard to debug problems will occur.

Functions and member functions prefixed with ___discard and a result type that maps to a Scheme string (c-string), will have their result type changed to c-string* instead.

Constants (as declared by #define or enum) are not visible outside of the current Compilation units unless the export-constants option has been enabled. Only numeric or character constants are directly supported.

Function-arguments may be preceded by ___in, ___out and ___inout qualifiers to specify values that are passed by reference to a function, or returned by reference. Only basic types (booleans, numbers and characters) can be passed using this method. During the call a pointer to a temporary piece of storage containing the initial value (or a random value, for ___out parameters) will be allocated and passed to the wrapped function. This piece of storage is subject to garbage collection and will move, should a callback into Scheme occur that triggers a garbage collection. Multiple __out and ___inout parameters will be returned as multiple values, preceded by the normal return value of thhe function (if not void). Here is a simple example:

(bind* #<<EOF
#ifndef CHICKEN
#include <math.h>
#endif
double modf(double x, ___out double *iptr);
EOF
)
(let-values ([(frac int) (modf 33.44)])
  ...)

Function-arguments may be preceded by ___length(ID), where ID designates the name of another argument that must refer to a number vector or string argument. The value of the former argument will be computed at run-time and thus can be omitted:

(require-extension srfi-4)
(bind* #<<EOF
double sumarray(double *arr, ___length(arr) int len)
{
  double sum = 0;
  while(len--) sum += *(arr++);
  return sum;
}
EOF
)
(print (sumarray (f64vector 33 44 55.66)))

The length variable may be positioned anywhere in the argument list. Length markers may only be specified for arguments passed as SRFI-4 byte-vectors, byte-vectors (as provided by the lolevel library unit) or strings.

Structure and union definitions containing actual field declarations generate getter procedures (and SRFI-17 setters when declared ___mutable or the mutable-fields option has been enabled) The names of these procedures are computed by concatenating the struct (or union) name, a hyphen ("-") and the field name. Structure definitions with fields may not be used in positions where a type specifier is normally expected. The field accessors operate on struct/union pointers only. Additionally a zero-argument procedure named make-<structname> will be generated that allocates enough storage to hold an instance of the structure (or union). Prefixing the definition with ___abstract will omit the creation procedure.

(bind* #<<EOF
struct My_struct { int x; ___mutable float y; };
typedef struct My_struct My_struct;
My_struct *make_struct(int x, float y) 
{
  My_struct *s = (My_struct *)malloc(sizeof(My_struct));
  s->x = x;
  s->y = y;
  return s;
}
EOF
)

will generate the following definitions:

(make-My_struct) -> PTR
(My_struct-x PTR) -> INT
(My_struct-y PTR) -> FLOAT
(set! (My_struct-y PTR) FLOAT)
(make_struct INT FLOAT) -> PTR

As of version 1.0, nested structs are supported. While nested unions are not, pointers to either unions and structs are.

Mutable struct-members of type c-string or c-string* will copy the passed argument string when assigned (using strdup(3)), since data may move in subsequent garbage collections.

All specially handled tokens preceded with ___ are defined as C macros in the headerfile chicken.h and will usually expand into nothing, so they don't invalidate the processed source code.

C++ namespace declarations of the form namespace NAME @{... @} recognized but will be completely ignored.

Keep in mind that this is not a fully general C/C++ parser. Taking an arbitrary headerfile and feeding it to bind will in most cases not work or generate riduculuous amounts of code. This FFI facility is for carefully written headerfiles, and for declarations directly embedded into Scheme code.

Syntactic forms

bind

[syntax] (bind STRING ...)

Parses the C code in STRING ... and expands into wrapper code providing access to the declarations defined in it.

bind*

[syntax] (bind* STRING ...)

Similar to bind, but also embeds the code in the generated Scheme expansion using foreign-declare.

bind-type

[syntax] (bind-type TYPENAME SCHEMETYPE [CONVERTARGUMENT [CONVERTRESULT]])

Declares a foreign type transformation, similar to define-foreign-type. There should be two to four arguments: a C typename, a Scheme foreign type specifier and optional argument- and result-value conversion procedures.

;;;; foreign type that converts to unicode (assumes 4-byte wchar_t):
;
; - Note: this is rather kludgy and is only meant to demonstrate the `bind-type'
;         syntax
(require-extension srfi-4 bind)
(define mbstowcs (foreign-lambda int "mbstowcs" nonnull-u32vector c-string int))
(define (str->ustr str)
  (let* ([len (string-length str)]
         [us (make-u32vector (add1 len) 0)] )
    (mbstowcs us str len)
    us) )
(bind-type unicode nonnull-u32vector str->ustr)
(bind* #<<EOF
static void foo(unicode ws)
{
  printf("%ls\n", ws);
}
EOF
)
(foo "this is a test!")

bind-opaque-type

[syntax] (bind-opaque-type TYPENAME SCHEMETYPE)

Similar to bind-type, but provides automatic argument- and result conversions to wrap a value into a structure:

(bind-opaque-type myfile (pointer "FILE"))
(bind "myfile fopen(char *, char *);")
(fopen "somefile" "r")   ==> <myfile>

(bind-opaque-type TYPENAME TYPE) is basically equivalent to (bind-type TYPENAME TYPE TYPE->RECORD RECORD->TYPE) where TYPE->RECORD and RECORD->TYPE are compiler-generated conversion functions that wrap objects of type TYPE into a record and back.

bind-file

[syntax] (bind-file FILENAME ...)

Reads the content of the given files and generates wrappers using bind. Note that FILENAME ... is not evaluated.

bind-file*

[syntax] (bind-file* FILENAME ...)

Reads the content of the given files and generates wrappers using bind*.

bind-rename

[syntax] (bind-rename CNAME SCHEMENAME)

Defines to what a certain C/C++ name should be renamed. CNAME specifies the C/C++ identifier occurring in the parsed text and SCHEMENAME gives the name used in generated wrapper code.

bind-rename/pattern

[syntax] (bind-rename/pattern REGEX REPLACEMENT)

Declares a renaming pattern to be used for C/C++ identifiers occuring in bound code. REGEX should be a string or SRE and replacement a string, which may optionally contain back-references to matched sub-patterns.

bind-options

[syntax] (bind-options OPTION VALUE ...)

Enables various translation options, where OPTION is a keyword and VALUE is a value given to that option. Note that VALUE is not evaluated.

Possible options are:

export-constants
export-constants: BOOLEAN

Define global variables for constant-declarations (with #define or enum), making the constant available outside the current compilation unit.

class-finalizers
class-finalizers: BOOLEAN

Automatically generates calls to set-finalizer! so that any unused references to instances of subsequently defined C++ class wrappers will be destroyed. This should be used with care: if the embedded C++ object which is represented by the reclaimed coops instance is still in use in foreign code, unpredictable things will happen.

mutable-fields
mutable-fields: BOOLEAN

Specifies that all struct or union fields should generate setter procedures (the default is to generate only setter procedures for fields declared ___mutable).

constructor-name
constructor-name: STRING

Specifies an alternative name for constructor methods (the default is constructor), a default-method for which is defined in the cplusplus-object extension.

destructor-name
destructor-name: STRING

Specifies an alternative name for destructor methods (the default is destructor), a default-method for which is defined in the cplusplus-object extension.

exception-handler
exception-handler: STRING

Defines C++ code to be executed when an exception is triggered inside a C++ class member function. The code should be one or more catch forms that perform any actions that should be taken in case an exception is thrown by the wrapped member function:

(bind-options exception-handler: "catch(...) { return 0; }")
(bind* #<<EOF
class Foo {
 public:
  Foo *bar(bool f) { if(f) throw 123; else return this; }
};
EOF
)
(define f1 (new <Foo>))
(print (bar f1 #f))
(print (bar f1 #t))
(delete f1)

will print <Foo> and #f, respectively.

full-specialization
full-specialization: BOOLEAN

Enables full specialization mode. In this mode all wrappers for functions, member functions and static member functions are created as fully specialized coops methods. This can be used to handle overloaded C++ functions properly. Only a certain set of foreign argument types can be mapped to coops classes, as listed in the following table:

Type Class
char <char>
bool <bool>
c-string <string>
unsigned-char <exact>
byte <exact>
unsigned-byte <exact>
[unsigned-]int <exact>
[unsigned-]short <exact>
[unsigned-]long <integer>
[unsigned-]integer <integer>
float <inexact>
double <inexact>
number <number>
(enum _)char <exact>
(const T)char (as T)
(function ...) <pointer>
c-pointer <pointer>
(pointer _) <pointer>
(c-pointer _) <pointer>
u8vector <u8vector>
s8vector <s8vector>
u16vector <u16vector>
s16vector <s16vector>
u32vector <u32vector>
s32vector <s32vector>
f32vector <f32vector>
f64vector <f64vector>

All other foreign types are specialized as #t.

Full specialization can be enabled globally, or only for sections of code by enclosing it in

(bind-options full-specialization: #t)
(bind #<<EOF
...
int foo(int x);
int foo(char *x);
...
EOF
)
(bind-options full-specialization: #f)

Alternatively, member function definitions may be prefixed by ___specialize for specializing only specific members.

prefix
 prefix: STRING

Sets a prefix that should be be added to all generated Scheme identifiers. For example

(bind-options prefix: "mylib:")
(bind "#define SOME_CONST 42")

would generate the following code:

(define-constant mylib:SOME_CONST 42)

To switch prefixing off, use the value #f. Prefixes are not applied to Class names.

default-renaming
default_renaming: STRING

Chooses a standard name-transformation, converting underscores (_) to hyphens (-) and transforming CamelCase into camel-case. All uppercase characters are also converted to lowercase. The result is prefixed with the argument STRING (equivalent to the prefix ption).

foreign-transform
foreign-transformer: procedure

Applies the supplied procedure before emitting code. Note that the procedure is evaluated at compile-time. You can reference procedure-names when you use define-for-syntax. The procedure takes in two arguments: a bind-foreign-lambda sexpression and a rename procedure (for hygienic macros). The car of bind-foreign-lambda form is either a renamed version of 'foreign-lambda*' or 'foreign-safe-lambda*'.

 (define-for-syntax (my-transformer form rename)
   (pp form) `(lambda () #f)) ;; prints and disables all bindings!
 (bind-options foreign-transformer: my-transformer)
 (bind "void foo();")

foreign-transformer may be useful with large header-files that require custom type-conversion, where bind-foreign-type isn't flexible enough. It is possible to write foreig-transformers that allow returning structs by value, for example, by converting to a blob or u8vector.

bind-foreign-lambdas are similar to foreign-lambdas, but use s-expressions instead of flat C strings to allow simple modification. See some of the tests for how these may be used: http://bugs.call-cc.org/browser/release/4/bind/trunk/tests.

bind-include-path

[syntax] (bind-include-path STRING ...)

Appends the paths given in STRING ... to the list of available include paths to be searched when an #include ... form is processed by bind.

Grammar

The parser understand the following grammar:

PROGRAM = PPCOMMAND
        | DECLARATION ";"

PPCOMMAND = "#define" ID [TOKEN ...]
          | "#ifdef" ID
          | "#ifndef" ID
          | "#else"
          | "#endif"
          | "#undef" ID
          | "#error" TOKEN ...
          | "#include" INCLUDEFILE
          | "#import" INCLUDEFILE
          | "#pragma" TOKEN ...

DECLARATION = FUNCTION
            | VARIABLE
            | ENUM
            | TYPEDEF
            | CLASS
            | CONSTANT
            | STRUCT
            | NAMESPACE
            | USING

STRUCT = ("struct" | "union") ID ["{" {["___mutable"] TYPE {"*"} ID {"," {"*"} ID}} "}]

NAMESPACE = "namespace" ID "{" DECLARATION ... "}"

USING = "using" "namespace" ID

INCLUDEFILE = "\"" ... "\""
            | "<" ... ">"

FUNCTION = {"___safe" | "___specialize" | "___discard"} [STORAGE] TYPE ID "(" ARGTYPE "," ... ")" [CODE]
         | {"___safe" | "___specialize" | "___discard"} [STORAGE] TYPE ID "(" "void" ")" [CODE]

ARGTYPE = [IOQUALIFIER] TYPE [ID ["=" NUMBER]]

IOQUALIFIER = "___in" | "___out" | "___inout" | LENQUALIFIER

LENQUALIFIER = "___length" "(" ID ")"

VARIABLE = [STORAGE] ENTITY ["=" INITDATA]

ENTITY = TYPE ID ["[" ... "]"]

STORAGE = "extern" | "static" | "volatile" | "inline"

CONSTANT = "const" TYPE ID "=" INITDATA

ENUM = "enum" "{" ID ["=" (NUMBER | ID)] "," ... "}"

TYPEDEF = "typedef" TYPE ["*" ...] [ID]

TYPE = ["const"] BASICTYPE [("*" ... | "&" | "<" TYPE "," ... ">" | "(" "*" [ID] ")" "(" TYPE "," ... ")")]

BASICTYPE = ["unsigned" | "signed"] "int" 
          | ["unsigned" | "signed"] "char" 
          | ["unsigned" | "signed"] "short" ["int"]
          | ["unsigned" | "signed"] "long" ["int"]
          | ["unsigned" | "signed"] "___byte" 
          | "size_t"
          | "float"
          | "double"
          | "void"
          | "bool"
          | "___bool"
          | "___scheme_value"
          | "___scheme_pointer"
          | "___byte_vector"
          | "___pointer_vector"
          | "___pointer" TYPE "*"
          | "C_word"
          | "___fixnum"
          | "___number"
          | "___symbol"
          | "___u32"
          | "___s32"
          | "___s64"
          | "__int64"
          | "int64_t"
          | "uint32_t"
          | "uint64_t"
          | "struct" ID
          | "union" ID
          | "enum" ID
          | ID

CLASS = ["___abstract"] "class" ID [":" [QUALIFIER] ID "," ...] "{" MEMBER ... "}"

MEMBER = [QUALIFIER ":"] ["virtual"] (MEMBERVARIABLE | CONSTRUCTOR | DESTRUCTOR | MEMBERFUNCTION)

MEMBERVARIABLE = TYPE ID ["=" INITDATA]

MEMBERFUNCTION = {"___safe" | "static" | "___specialize" | "___discard"} TYPE ID "(" ARGTYPE "," ... ")" ["const"] ["=" "0"] [CODE]
               | {"___safe" | "static" | "___specialize" | "___discard"} TYPE ID "(" "void" ")" ["const"] ["=" "0"] [CODE]

CONSTRUCTOR = ["___safe"] ["explicit"] ID "(" ARGTYPE "," ... ")" [BASECONSTRUCTORS] [CODE]

DESTRUCTOR = ["___safe"] "~" ID "(" ["void"] ")" [CODE]

QUALIFIER = ("public" | "private" | "protected")

NUMBER = <a C integer or floating-point number, in decimal, octal or hexadecimal notation>

INITDATA = <everything up to end of chunk>

BASECONSTRUCTORS = <everything up to end of chunk>

CODE = <everything up to end of chunk>

The following table shows how argument-types are translated:

C type Scheme type
[unsigned] char char
[unsigned] short [unsigned-]short
[unsigned] int [unsigned-]integer
[unsigned] long [unsigned-]long
___u32 unsigned-integer32
___s32 integer32
___s64 integer64
int64_t integer64
__int64 integer64
uint32_t unsigned-integer32
uint64_t unsigned-integer64
float float
double double
size_t unsigned-integer
bool int
___bool int
___fixnum int
___number number
___symbol symbol
___scheme_value scheme-object
C_word scheme-object
___scheme_pointer scheme-pointer
char * c-string
signed char * s8vector
[signed] short * s16vector
[signed] int * s32vector
[signed] long * s32vector
unsigned char * u8vector
unsigned short * u16vector
unsigned int * u32vector
unsigned long * u32vector
float * f32vector
double * f64vector
___byte_vector byte-vector
___pointer_vector pointer-vector
CLASS * (instance CLASS <CLASS>)
CLASS & (instance-ref CLASS <CLASS>)
TYPE * (pointer TYPE)
TYPE & (ref TYPE)
TYPE<T1, ...> (template TYPE T1 ...)
TYPE1 (*)(TYPE2, ...) (function TYPE1 (TYPE2 ...))

The following table shows how result-types are translated:

C type Scheme type
void void
[unsigned] char char
[unsigned] short [unsigned-]short
[unsigned] int [unsigned-]integer
[unsigned] long [unsigned-]long
___u32 unsigned-integer32
___s32 integer32
___s64 integer64
int64_t integer64
__int64 integer64
uint64_t unsigned-integer64
__uint64 unsigned-integer64
float float
double double
size_t unsigned-integer
bool bool
___bool bool
___fixnum int
___number number
___symbol symbol
___scheme_value scheme-object
char * c-string
TYPE * (c-pointer TYPE)
TYPE & (ref TYPE)
TYPE<T1, ...> (template TYPE T1 ...)
TYPE1 (*)(TYPE2, ...) (function TYPE1 (TYPE2 ...))
CLASS * (instance CLASS <CLASS>)
CLASS & (instance-ref CLASS <CLASS>)

The ___pointer argument marker disables automatic simplification of pointers to number-vectors and C-strings: normally arguments of type int * are handled as SRFI-4 s32vector number vectors. To force treatment as a pointer argument, precede the argument type with ___pointer. The same applies to strings: char * is by default translated to the foreign type c-string, but ___pointer char * is translated to (c-pointer char).

C notes

Foreign variable definitions for macros are not exported from the current compilation unit, but definitions for C variables and functions are.

bind does not embed the text into the generated C file, use bind* for that.

Functions with variable number of arguments are not supported.

C++ notes

Each C++ class defines a coops class, which is a subclass of <c++-object>. Instances of this class contain a single slot named this, which holds a pointer to a heap-allocated C++ instance. The name of the coops class is obtained by putting the C++ classname between angled brackets (<...>). coops classes are not seen by C++ code.

The C++ constructor is invoked by the constructor generic, which accepts as many arguments as the constructor. If no constructor is defined, a default-constructor will be provided taking no arguments.

To release the storage allocated for a C++ instance invoke the delete generic (the name can be changed by using the destructor-name option).

Static member functions are wrapped in a Scheme procedure named <class>::<member>.

Member variables and non-public member functions are ignored.

Virtual member functions are not seen by C++ code. Overriding a virtual member function with a coops method will not work when the member function is called by C++.

Operator functions and default arguments are not supported.

Exceptions must be explicitly handled by user code and may not be thrown beyond an invocation of C++ by Scheme code.

Generally, the following interface to the creation and destruction of wrapped C++ instances is provided:

constructor

[procedure] (constructor CLASS INITARGS)

A generic function that, when invoked will construct a C++ object represented by the class CLASS, which should inherit from <c++-object>. INITARGS is a list of arguments that should be passed to the constructor and which must match the argument types for the wrapped constructor.

destructor

[procedure] (destructor OBJECT)

A generic function that, when invoked will destroy the wrapped C++ object OBJECT.

new
[procedure] (new CLASS ARG1 ...)

A convenience procedure that invokes the constructor generic function for CLASS.

delete
[procedure] (delete OBJECT)

A convenience procedure that invokes the destructor generic method of OBJECT.

Authors

felix winkelmann and Kristian Lein-Mathisen

License

This code is placed into the public domain.

Version History

1.2
fix handling of u64 type, strdup c-strings assigned to mutable struct-members (contributed by Andrei Barbu)
1.0
added support for passing structures by value (contributed by kristianlm)
0.99.1
bugfixes in bind-options (thanks to kristianlm)
0.98
removed stupid check that caused import of modules that use bind to fail in csi (thanks to Christian Kellerman)
0.97
fixed broken handling of ___pointer
0.95
added support for hexadecimal escapes in character constants (thanks to Ross Lonstein)
0.92
fixed use of deprecated pointer type in CHICKEN 4.6.4; fixed testcase
0.91
various bugfixes (thanks to Christian Kellermann)
0.9
using bind in interpreted code shows suitable error message
0.7
added regex dependency
0.6
commit screwup
0.6
fixed bug in variable access procedure generation
0.5
support for pointer-vectors and unsigned in32/int64 (the latter might not work)
0.4
added renaming options to chicken-bind
0.3
fixed typo in bind-translator.scm (thanks to Shawn Rutledge)
0.2
fixed bug in chicken-bind and added -parse option
0.1
imported modified easyffi