Programming for Performance
Introduction
Compiled CHICKEN code can be pretty fast - depending on programming style and problem set CHICKEN outperforms most Scheme systems, provided the tools and options available to speed up your code are used in a wise manner.
First of all, a disclaimer: standard R5RS Scheme can not be executed efficiently. Period. Every implementation must bend the semantics of the language in more or less subtle ways to do a minimum of optimization and there are several approaches to do this:
- drop expensive operations from the language - this is what Bigloo does: it only supports continuations when they are explicitly enabled and doing so will impact overall performance substantially
- another expensive language feature is support for optimizing tail calls (Bigloo avoids this as well, as does Stalin)
- generic arithmetic requires dispatching according to the types of the arguments of an operation - type-specific (monomorphic) operations like fx+ can reduce the overhead
- perform flow-analysis or type-inference - the latter is usually somewhat stricter and may not allow variables to change their type during the execution of the code
- do not allow redefinition of standard procedures - this is the most crucial requirement, as all further special treatment of primitive operations depends on the compiler being able to assume that + still means addition everywhere
That last point is important: nearly all Scheme implementations provide some way to make sure primitives are known, either by using a module system (CHICKEN, Racket, Bigloo), performing heavyweight flow analysis (Stalin), or by using declarations (Gambit, CHICKEN). An interesting way of dealing with this problem is used by Gambit: "speculative inlining" emits checks that make sure a variable naming a standard procedure has not been redefined. This allows both redefinitions and specialization of primitive calls but may lead to bloat of the executable and has a certain runtime cost.
Optimization options
If you want to make sure your code runs fast, the first measure should be to crank up the optimization level. There are 5 levels where each level enables an additional set of optimizations:
- Level 1 ("-O1")
- optimize small, non-allocating procedures when generating C code
- Level 2 ("-O2")
- perform general inlining of small known procedures
- Level 3 ("-O3")
- assume definitions in the compilation unit are not modified from other compilation units - this allows inlining toplevel procedures, provided they are small enough; additionally "specialization" is enabled (see below)
- Level 4 ("-O4")
- compile in unsafe mode, so runtime-checks are removed which will result in undefined behaviour (crashes) if uncaught errors occur at runtime
- Level 5 ("-O5")
- do everything to get the fastest code possible, disabling interrupts (threads + signal handling), remove all debugging information and compile in "block" mode
The compiler will by default always assume standard procedures (and a set of so called "extended" procedures) will not be redefined, see the FAQ for a list of these procedures. The compiler will also do a good deal of copy-propagation, constant folding and value-propagation, together with inlining of known procedures that are only called at one location in the code. These optimizations never change the semantics and so will not alter the behaviour of the program or conflict with R5RS.
Side note: the default debugging settings will emit "trace" code that keeps a call-trace during execution of the program to provide more usable error messages. Compile with "-d1" or "-d0" to avoid this as it has a runtime cost.
The so called "block" mode compiles in a "closed world" manner: by compiling in this mode you declare that the toplevel definitions are not visible from beyond the currently compiled file. This improves inlining and allows unused code to be removed.
Compiler syntax
Using syntax to perform hand-inlining is a common way of speeding up Scheme or Lisp code. CHICKEN provides define-inline which is slightly more convenient and supports "compiler syntax", similar to the compiler macros available in Common Lisp. Compiler syntax means you are defining one or more rewrite rules that should be applied to (explicit) calls to certain functions. As an example, consider the case of having two functions: one that packs up an argument and a complementary function that does the opposite. Defining
(define-compiler-syntax unpack (syntax-rules (pack) ((_ (pack x)) x))) (define-compiler-syntax pack (syntax-rules (unpack) ((_ (unpack x)) x)))
A call of the form
(pack (unpack foo))
will be changed to simply
foo
Note that when no rule matches, the call is compiled as it originally appeared in the code. Compiler syntax may be defined with a procedural macro using er-macro-transformer, in this case returning the original form (being eq?) will assume default processing. What can be done with this device is limited to syntactically visible special cases, but particularly macro-generated code may expose many regularities that can be easily handled in this manner. Compiler syntax can not be exported and is always local to the current compilation unit.
Type declarations and specialization
Scheme is a dynamically typed language: values carry their type information and type-checks are usually done at run-time which will always result in executables that run slower than their counterpart written in C, C++, ML or another statically typed language. Monomorphic operations specific to a certain value type still need to test their arguments of being of correct type, as long as type-information is not available statically, during compile-time.
CHICKEN supports a basic form of "local" flow-analysis that attempts to maintain type-information for local bindings and subexpressions of a procedure body. This analysis must naturally be conservative as global mutation and side-effects may break all assumptions over variable and expression types. Based on that type information, certain type-related errors can be caught, like passing an argument of a wrong type to a known library procedure. The current version of CHICKEN uses this information to replace calls to known library procedures with more efficient procedure calls or performs the equivalent code directly in-line, a process called "specialization".
The user can assists in giving the compiler type-information via declarations, e.g.:
(: mumble (float -> (list float string))) (define (mumble num) ...)
Here we declared mumble to be a procedure of one argument, a floating-point number, and returning a list of two elements (a floating-point number and a string). The type-syntax and declaration form is similar to the one used by Typed Racket, but is purely optional and does not treat the code as a separate dialect or sub-language. The type-system is straightforward with support for basic types, sequence types, a "union" (or) type and qualified types with type-variables and optional constraints.
Here is an example, the internal declaration of the type of the sort library procedure in the data-structures unit:
(: sort (forall (e (s (or (vector-of e) (list-of e)))) (s (e e -> *) -> s)))
sort takes a sequence which must be either a vector or a list, and a procedure taking two arguments of the element type of the argument sequence. It will return a result of the same type. As can be seen here, types can be pretty elaborate so abbreviations may be defined using define-type.
Type-declarations can be exposed to be used by other compilation units. The compiler option -emit-type-file writes the locally declared type-information to a file (usually having the ".types" extension. Loading this file explicitly with the -types option or implicitly via the use or require-extension forms will make the type-information available to client code. This means you can simply install a EXTENSIONNAME.types file along with your extension and type-information will be available to provide more opportunitites to catch statically detectable type errors or to speed up user code by taking advantage of known result- and argument types.
It is even possible to hook into the specialization machinery by using define-specialization, a feature similar to compiler-syntax but matching on type-information, similar to C++ templates:
(define-specialization (mumble (x fixnum))
(list x (number->string x)))
Here we declared that a call to mumble, when invoked with a fixnum argument (this must be a direct, syntactically visible call) will be replaced with the given expression (actually a call to a hidden global procedure, so no code-duplication will take place, unless the body is small enough to be subject to inlining).
The flow-analysis algorithm takes care not to introduce "false positives" and will usually infer less information than might at first sight be available. Explicit assignments make the analysis more difficult as will variables that have different types during the lifetime of the program. The compiler tries hard not to emit unsafe code in case type-declarations are violated, but in the end may fail to do so and it is the reponsibility of the programmer to not declare variables incorrectly. Using the -strict-types compiler option will go one step further and declare that variables have fixed types and declarations will not be violated. The compiler will in this case have a maximum of type information and can try to optimize more optimistically.
For an example of record-type definitions enhanced with type information, see typed-records which replaces the usual record definition forms like define-record with typed variants that will give much more type-information to the compiler improving error reporting and runtime performance.
Compiler-syntax and user-defined specializations allow manipulating code heavily up to the point of completely changing the meaning of the original program. This can be used to make code completely inscrutable but also allows tuning code without touching it, something that may be useful when you want to incorporate upstream code unchanged into a library.
functors
"Functors" can be used to define generic code for specialization over a set of bindings. Say you have a general algorithm and want to specialize it for various representations of data. A functor defines a higher order module: a module that can take other modules as arguments. So, for example:
(functor (dot-product (VS (vref vlen add multiply zero))) (dot-product) (import scheme chicken VS) (define (dot-product a b) (let ((alen (vlen a)) (blen (vlen b))) (assert (= alen blen)) (do ((i 0 (fx+ i 1)) (r (zero) (add r (multiply (vref a i) (vref b i))))) ((fx>= i alen) r))))) ;; "instantiate" the functor with a given implementation (module f64dp = dot-product (import scheme chicken) (use srfi-4) (define vlen f64vector-length) (define vref f64vector-ref) (define (zero) 0.0) (define multiply fp*) (define add fp+))
The dot-product functor takes a module that exports a number of operations and implements the scalar product of a vector. The second form above combines the definition of a module with the instantiation of the functor (see the manual for more information). The final result will be a module named f64dp that defines the scalar product of a SRFI-4 f64vector. Functors are somewhat like code generation factories. Note that nothing prevents the imports of the dot-product functor to be syntax.
Programming style
The style of coding naturally has a great impact on what the compiler is able to do with the code. As has been already described, assignments make flow-analysis harder and should be avoided. Monomorphic operators make type-information more explicit and also make the code easier to understand (because reading code means mentally keeping track of value types, however abstract they may be - using dynamically typed languages means doing the type-inference in your head, a price you pay for more expressivity). Keep related definitions together, preferrable in the same compilation unit to give more opportunities for inlining. For the same reason hide unexported variables (for example by using modules) which means they can not be destructively modified from elsewhere. Non-tail procedure calls in CHICKEN are slightly more expensive than tail calls (while the latter is as costly as a raw C function call, the former needs allocation of a closure record) so the more the compiler is able to find out from the information you give him, the better.
Conclusion
To find out more ways of speeding up your code, check out CRUNCH, fast-generic or record-variants. Scheme is so flexible that there are countless possibilities to extend the language with forms the provide more opportunities for optimizing code, either manually or by assisting the compiler. CHICKEN tries to expose much of the internal machinery to the user but this does not automagically make your code run as fast as idiomatic C. There are a lot of implicit assumptions that apply in Scheme code, assumptions that may not be that obvious at first glance (genericity, global redefinitions, destructive mutations, eval, continuations, etc.). Experience with the system and the language is very important so: EXPERIMENT! There are a lot debugging options to peek at what the compiler is making of your code inside the various stages that it goes through. Use them and don't be afraid of asking on the CHICKEN mailing lists if you have a question.