crunch (historical revision 20568) - The CHICKEN Scheme wiki

You are looking at historical revision 20568 of this page. It may differ significantly from its current revision.

crunch

Introduction

crunch compiles a severely restricted statically typed subset of R5RS Scheme to C++. It can be used to generate standalone executables or code embedded into Scheme programs.

This extension is highly experimental and likely to contain many bugs and incomplete functionality.

To use crunch in your Scheme code, simply wrap toplevel forms to be compiled in a (crunch ...) expression. All toplevel procedure definitions are accessible as global or local (depending on the context where the crunch form occurs) procedures callable from Scheme. The crunch macro can only be used in compiled code. To use the macro, put

 (import crunch)

in your code.

Alternatively, the chicken-crunch program can be used to translate and compile code in the crunch Scheme dialect into C++. The generated code has no dependencies. Only the headerfile crunch.h must be available and in your C++ compiler's include path. When installing this extension with chicken-install, the file will be located in your default include path, usually $PREFIX/include.

The compiler can also be used through its procedural API, see crunch-compile. In that case, load the runtime-part of the compiler with

 (require-extension crunch-compiler)

Crunched procedures are in every respect identical to C/C++ functions called via the usual CHICKEN foreign function interface. Crunch does not know anything about Scheme data or memory management. Translated code can call back into Scheme (see define-crunch-callback) - callbacks are usually automatically detected and the generated Scheme wrapper function for a crunched procedure will be of the appropiate type, if required.

Crunch uses its own macro expander, a modified version of Al Petrofsky's alexpander, a R5RS compliant implementation of syntax-rules macros.

No garbage collector is used. All dynamically allocated data (strings and number vectors) are managed using reference-counting.

The dialect of R5RS Scheme supported is extremely limited. See Bugs and limitations for more information.

Note that if you use the crunch macro in your code, you must compile the file generated by chicken in C++ mode (just pass -c++ to csc when compiling).

To get maximum performance, inlining must be enabled in the C++ compiler when compiling crunch-generated code. The default optimization options do not enable inlining unless the default C compiler options have been overridden during installation of CHICKEN. Passing -C -O3 to csc for crunched code will usually optimize the C++ code considerably.

Programming interface

crunch-compile

[procedure] (crunch-compile EXPRESSION [PORT debug: DBGMODE entry-point: SYMBOL])

Compiles the toplevel expression EXPRESSION into a C++ code, writing the generated code to PORT, which defaults to the value of (current-output-port). If DBGMODE is given, debugging output will be written to the current output port. DBGMODE can be a boolean or a number between 1 and 3. Debug mode 1 shows some information about each compiled procedure, debug mode 3 generates loads of diagnostic output about the type-inferencing process and expanded code.

If the entry-point name SYMBOL is given, then the (normally hidden) toplevel variable of the same name holding a pointer to the associated C++ function can be accessed from C/C++ code, i.e. it is exposed under the same name. Note that the exposed variable is a pointer to a function.

Each invocation of crunch-compile creates its own private namespace, global variables are not visible in subsequent compilation runs in the same process. Syntax definitions are persistent over several invocations, though.

crunch-expand

[procedure] (crunch-expand EXPRESSION)

Expands all macros in the given toplevel expression and returns the expansion.

crunch

[syntax] (crunch EXPRESSION ...)

Compiles the given toplevel expressions and expands into a set of function definitions and an invocation of compiled toplevel expressions in EXPRESSION. The form can be used in a definition context but ends in a non-definition form (and so can with some macro systems not be followed by other definitions). Calls to Scheme callbacks are detected automatically and generate the appropriate foreign-safe-lambda definition. The result of the executed toplevel code is unspecified.

define-crunch-primitives

[syntax] (define-crunch-primitives ((NAME ARGTYPE ...) -> RESULTTYPE [C-NAME]) ...)

Define additional primitives with the given names and argument- and result types. if C-NAME is given, it specifies the name of the actual C/C++ function to be called. Otherwise NAME is used.

define-crunch-callback

[syntax] (define-crunch-callback (NAME (ARGFTYPE1 VAR1) ...) RESULTFTYPE BODY ...)

Equivalent to define-external, but makes the callback accessible in subsequent translations of crunch code.

Note that you have to pass -emit-external-prototypes-first to csc (or chicken) when you use crunch callbacks to place function prototypes for the callbacks in front of code generated by crunch.

Standalone compiler

The program chicken-crunch can be used to generate a standalone program or module that has no CHICKEN dependencies.

 usage: chicken-crunch OPTION | FILENAME ...
   
   -h            show this message
   -o FILENAME   set output filename
   -d            enable debug output
   -dd           enable more debug output
   -ddd          enable massive debug output
   -cc CC        select C++ compiler (default: "c++")
   -expand       only show code after expansion
   -entry NAME   set entry-point procedure
   -translate    only generate C++, don't compile
   
   All other options (arguments beginning with "-") are passed to
   the C++ compiler. FILENAME may be "-", which reads source code
   from stdin.

Provided the file crunch.h is in the include path, the generated C++ code can be compiled by itself. To link, you may have to add the -lm switch to the linker, depending on the platform on which you are compiling the code.

The type system

Crunch performs type-inference to find out the types of local and global variables. It currently knows about these types:

Crunch type	C type	Description
`int` `short` `long`	`int` `short` `long`	integer numbers
`float` `double`	`float` `double`	floating point numbers
`bool`	`bool`	boolean type
`char`	`char`	characters
`void`	`void`	the type of the "unspecified" value
`c-string`	`char *`	strings
`blob`	`void *`	a shapeless byte sequence
`c-pointer`	`void *`	an opaque pointer
`u8vector` `s8vector` `u16vector` `s16vector` `u32vector` `s32vector` `f32vector` `f64vector`	`unsigned char ` `signed char ` `unsigned short ` `short ` `unsigned int ` `int ` `float ` `double `	SRFI-4 homogenous number vectors

Important: callbacks are likely to trigger a garbage collection, which will invalidate references to number-vectors or strings allocated in normal Scheme code. This does not apply to data allocated inside crunched code, which is not subject to garbage collection.

Variables defined with define or set! or bound with let or in a lambda list can be declared to have a particular type by suffixing them with :: followed by a typename:

  (crunch
    (let ((a::int (* 8 (sin 1))))
      (display a::int)))               ; shows "8"

Note that the name of variable really is a::int, not a. You usually don't need these declarations, though.

Note also the absence of any other data types, in particular lists, vectors or record structures.

Crunched functions may return results of the following types:

 char
 int
 short
 long
 float
 double
 c-string
 c-pointer

Polymorphic procedures are not supported.

Available syntax

The following non-standard macros are provided:

 cond-expand
 when
 unless
 switch
 rec

cond-expand recognizes the feature identifiers crunch, srfi-0, highlevel-macros and syntax-rules. When code is compiled to a standalone program with chicken-crunch, the feature identifier crunch-standalone is defined as well.

Available primitives

All primitives take a fixed number of arguments, optional or "rest" arguments are not supported. Primitives may not be redefined. Uses of primitives in non-operator position are treated as (lambda (tmp1 ...) (<primitive> tmp1 ...)).

Argument type abbreviations:

O O1 O2	any data object
X Y	number
N N1 N2	integer
K K1 K2	positive integer
R R1 R2	inexact number
S S1 S2	string
C C1 C2	character
B	blob
U8 S8 U16 S16 U32 S32 F32 F64	SRFI-4 number vector
P	pointer

The following R5RS procedures are provided:

 (not O)

 (eq? O1 O2)
 (eqv? O1 O2)
 (equal? O1 O2)

 (+ X Y)
 (- X Y)
 (* X Y)
 (/ X Y)
 (= X Y)
 (> X Y)
 (< X Y)
 (>= X Y)
 (<= X Y)
 (abs X)
 (acos R)
 (asin R)
 (atan R)
 (ceiling X)
 (cos R)
 (display O)
 (even? N)
 (exact? X)
 (exact->inexact X)
 (exp R)
 (expt R1 R2)
 (floor X)
 (inexact? X)
 (inexact->exact X)
 (integer? X)
 (log R)
 (max X Y)
 (min X Y)
 (modulo N1 N2)
 (negative? X)
 (odd? N)
 (positive? X)
 (quotient N1 N2)
 (remainder N1 N2)
 (round X)
 (sin R)
 (sqrt X)
 (tan R)
 (truncate X)
 (zero? X)

max, min and expt are not exactness preserving. expt always returns an inexact result.

 (char=? C1 C2)
 (char>? C1 C2)
 (char<? C1 C2)
 (char>=? C1 C2)
 (char<=? C1 C2)
 (char->integer C)
 (char-alphabetic? C)
 (char-ci=? C1 C2)
 (char-ci>? C1 C2)
 (char-ci<? C1 C2)
 (char-ci>=? C1 C2)
 (char-ci<=? C1 C2)
 (char-downcase C)
 (char-lower-case? C)
 (char-numeric? C)
 (char-upper-case? C)
 (char-upcase C)
 (char-whitespace? C)
 (integer->char K)

 (number->string X K)
 (make-string N C)
 (string=? S1 S2)
 (string>? S1 S2)
 (string<? S1 S2)
 (string>=? S1 S2)
 (string<=? S1 S2)
 (string->number S K)
 (string-ci=? S1 S2)
 (string-ci>? S1 S2)
 (string-ci<? S1 S2)
 (string-ci>=? S1 S2)
 (string-ci<=? S1 S2)
 (string-append S1 S2)
 (string-copy S)
 (string-fill! S1 C)
 (string-length S)
 (string-ref S K)
 (string-set! S K C)
 (substring S K1 K2)

string->number does not detect invalid numerical syntax and simply wraps strtol(3)/strtod(3). If a radix different from 10 is given, the result will always be converted with strtol(3).

number->string ignores the radix argument if the converted number is inexact.

 (display X)
 (newline)
 (write-char C)

write-char, display and newline always write to stdout.

Non-R5RS procedures (see the The User's Manual for more information):

 (add1 X)
 (atan2 R1 R2)
 (arithmetic-shift N1 N2)
 (bitwise-and N1 N2)
 (bitwise-ior N1 N2)
 (bitwise-not N)
 (bitwise-xor N1 N2)
 (sub1 X)

 (f32vector-length F32)
 (f32vector-ref F32 K)
 (f32vector-set! F32 K R)
 (f64vector-length F64)
 (f64vector-ref F64 K)
 (f64vector-set! F64 K R)
 (make-f32vector K R)
 (make-f64vector K R)
 (make-s16vector K N)
 (make-s32vector K N)
 (make-s8vector K N)
 (make-u16vector K1 K2)
 (make-u32vector K1 K2)
 (make-u8vector K1 K2)
 (s16vector-length S16)
 (s16vector-ref S16 K)
 (s16vector-set! S16 K N)
 (s32vector-length S32)
 (s32vector-ref S32 K)
 (s32vector-set! S32 K N)
 (s8vector-length S8)
 (s8vector-ref S8 K)
 (s8vector-set! S8 K N)
 (subf32vector F32 K1 K2)
 (subf64vector F64 K1 K2)
 (subs16vector S16 K1 K2)
 (subs32vector S32 K1 K2)
 (subs8vector S8 K1 K2)
 (subu16vector U16 K1 K2)
 (subu32vector U32 K1 K2)
 (subu8vector U8 K1 K2)
 (u16vector-length U16)
 (u16vector-ref U16 K)
 (u16vector-set! U16 K1 K2)
 (u32vector-length U32)
 (u32vector-ref U32 K)
 (u32vector-set! U32 K1 K2)
 (u8vector-length U8)
 (u8vector-ref U8 K)
 (u8vector-set! U8 K1 K2)

 (blob->f32vector B)
 (blob->f32vector/shared B)
 (blob->f64vector B)
 (blob->f64vector/shared B)
 (blob->s16vector B)
 (blob->s16vector/shared B)
 (blob->s32vector B)
 (blob->s32vector/shared B)
 (blob->s8vector B)
 (blob->s8vector/shared B)
 (blob->string B)
 (blob->string/shared B)
 (blob->u16vector B)
 (blob->u16vector/shared B)
 (blob->u32vector B)
 (blob->u32vector/shared B)
 (blob->u8vector B)
 (blob->u8vector/shared B)
 (f32vector->blob F32)
 (f32vector->blob/shared F32)
 (f64vector->blob F64)
 (f64vector->blob/shared F64)
 (s16vector->blob S16)
 (s16vector->blob/shared S16)
 (s32vector->blob S32)
 (s32vector->blob/shared S32)
 (s8vector->blob S8)
 (s8vector->blob/shared S8)
 (string->blob S)
 (string->blob/shared S)
 (u16vector->blob U16)
 (u16vector->blob/shared U16)
 (u32vector->blob U32)
 (u32vector->blob/shared U32)
 (u8vector->blob U8)
 (u8vector->blob/shared U8)

The .../shared conversion procedures return data objects that share the actual storage with the argument objects, this can be used for interesting applications.

 (flush-output)

 (void)
 (error S)
 (exit N)
 (argc)
 (argv-ref K)

error shows a message and invokes abort(3). argc returns the number of arguments passed to the process (including the program name) and argv-ref returns the command line argument with the given index (or the program name, when the index is zero).

 (pointer-u8-ref P N)
 (pointer-s8-ref P N) 
 (pointer-u16-ref P N) 
 (pointer-s16-ref P N) 
 (pointer-u32-ref P N) 
 (pointer-s32-ref P N) 
 (pointer-f32-ref P N) 
 (pointer-f64-ref P N) 
 (pointer-u8-set! P N1 N2) 
 (pointer-s8-set! P N1 N2) 
 (pointer-u16-set! P N1 N2) 
 (pointer-s16-set! P N1 N2) 
 (pointer-u32-set! P N1 N2) 
 (pointer-s32-set! P N1 N2) 
 (pointer-f32-set! P N R) 
 (pointer-f64-set! P N R)

Notes

Pass -DDBGALLOC to the C++ compiler (either through chicken-crunch or to csc via -C -DDBGALLOC) to see log messages about the allocation and de-allocation of dynamic number vectors or strings.
Runtime errors invoke abort(3) and thus can not be caught.

Bugs and limitations

Lexical scope is not supported, only references to global variables and variables local to the current lambda construct (including let bound variables) are visible. Expressions of the form ((lambda (...) ...) ...) are converted in the corresponding let construct.
Local procedures are not available
letrec is not supported (it makes no sense without local procedures)
Continuations are not supported.
Multiple values are not supported.
Tail calls are only detected in self-recursive functions.
Rest-arguments (dotted lambda lists) are not supported.
Numeric overflow of fixnum operations is not detected.
Nearly no error checks are made at runtime.
Named let is always assumed to be a looping construct, calls to the loop variable must be in tail position.
do and named let loops always return an unspecified value.
The correctness of the C++ template code is unclear. C++ is insane.
If a homogenous number vector or string is passed from Scheme to C++ code generated by crunch, then the length of the passed array is not known and the associated ...-length primitive and primitives that require the length of the vector will abort.
Type-related errors do not always produce particularly useful context information
Error messages are generally pretty bad

Examples

(use crunch)

(crunch
  (define (string-reverse str)
    (let* ((n (string-length str))
              (s2 (make-string n #\space)))
         (do ((i 0 (add1 i)))
             ((>= i n))
           (string-set! s2 (sub1 (- n i)) (string-ref str i)))
         s2)) )

(print (string-reverse "this is a test!"))

License

Copyright (c) 2007-2009, Felix L. Winkelmann
The "alexpander" is Copyright (c) 2002-2004, Al Petrofsky

All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

 Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer. 
 Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the 
   documentation and/or other materials provided with the distribution. 
 Neither the name of the author nor the names of its contributors
   may be used to endorse or promote products derived from this
   software without specific prior written permission. 

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Version history

0.7.6: fixed bug in installation script (thanks to Jim Pryor)
0.7.5: fixed bug related to callbacks with void result type
0.7.4: removed unused test files
0.7.3: two bugfixes (Thanks to Jeronimo)
0.7.2: fixed silly mistake
0.7.1: fixed bug in setup script
0.7: ported to CHICKEN 4
0.6: updated to newest alexpander
0.5: fixed buggy formatting directive
0.4: support for libarena by Ivan Raikov
0.3: fixed bugs in character handling [thanks to Alex Shinn]
0.2: fixed bugs in naming of char->integer and integer->char
0.1: initial release

crunch

Introduction

Author

Requirements

Usage

Programming interface

crunch-compile

crunch-expand

crunch

define-crunch-primitives

define-crunch-callback

Standalone compiler

The type system

Available syntax

Available primitives

Notes

Bugs and limitations

Examples

License

Version history