A guide to the CHICKEN compiler

internals

This is insane. What we clearly want to do is not exactly clear, and is rooted in NCOMPLR.[1]

Preliminaries

Files

The compiler is made up from the following files:

chicken.scm: command-line parsing
batch-driver.scm: option processing and invocation of the compiler passes
compiler.scm: basic passes of translation
optimizer.scm: high-level optimization
support.scm: various support routines
c-backend.scm: C-code generator
c-platform.scm: tables and rewrite rules
tweaks.scm: some performance hacks to speed up the compiler
banner.scm: the banner shown by -version
private-namespace.scm: some support code that defines a macro (private) to hide toplevel identifiers (very ugly)
chicken-more-macros.scm: non-standard low-level macro definitions
chicken-ffi-macros.scm: low-level macros for interfacing to foreign code
scrutinizer.scm: the type checker

At the start of most compiler files you'll find a funny list of symbols of the form (private compiler ...) . This uses an internal hook to rename the listed toplevel identifiers by prefixing them with ##compiler# (these are called qualified symbols). It is a crude way to separate the toplevel definitions of the compiler from user code and avoid one to stumble over the other.

Command-line parsing (chicken.scm)

In chicken.scm command-line arguments are collected and translated into canonical form by removing the hyphen and filtering run-time options (those starting with -:...). Also any options given in the environment variable CHICKEN_OPTIONS are added and -optimize-level is expanded into distinct, more specialized options. What remains is a list of symbols wth optional arguments which is then passed to compile-source-file in batch-driver.scm.

Note: the processing of command-line parameters can be customized by setting the parameter user-options-pass in the compilation environment.

The compiler driver (batch-driver.scm)

compile-source-file initializes global variables used by the compiler, default initialization- and termination-code that is to be added to the user-supplied source code. All command-line options are now processed and are used to set global and local variables that control the compilation process. compile-source-file contains many local procedures that are used for printing debugging- and progress output, or help processing complex option arguments. Also, the hook-procedure (infohook) for collecting line-number information is defined here.

Now, compiler extensions are loaded, features are registered and extra source code fragments (the ones mentioned above, those that are stored in postponed-initforms and the source-code equivalents of the -extension and -require-extension options.

More option processing follows. If profiling is enabled, extra source code is prepended to the compiler-generated code (initforms) that initialize the profiling library.

Next, the actual source file to be compiled is read in and code supplied via prelude, postlude, prologue and epilogue options is added. Reading in files uses the above mentioned infohook to fill the line-number database.

The user-preprocessor-pass is invoked on the source forms.

Now the source code is canonicalized (canonicalized-expression in compiler.scm), i.e. macro-expanded. After the canonicalization code for setting hidden variables for complex, immutable literals, code for invoking used units and code for collecting profile-information is added. So called compressed literals are expanded into code that reads them from a string literal (that simply contains the text-representation of the literal) and finally some cleanup-code is added.

The secondary line-number data base (line-number-database-2) is now stored in ##sys#line-number-database to make subsequent searches use the updated table (see below for more information about this).

If -check-syntax was given, the compiler terminates now. Otherwise, the user-pass is now applied to all toplevel forms.

The canonicalized source expressions together with all compiler- supplied code is now transformed into a node-graph, an abstract syntax tree made from record structures. user-pass-2 has a chance to run, if it was supplied, to perform some sort of pre-analysis on the code. Here the first invocation of analyze-expression in compiler.scm may happen.

If lambda-lifting has been enabled, it is performed now, after doing another round of analyze-expression (perform-lambda-lifting! in optimizer.scm).

Note: Since the analyzer may behave differently, depending on whether this is the first or a subsequent analysis, the global variable first-analysis is set to false in the preceding two optional analysis phases. The first analysis pass is slightly special and collects extra information that would be useless in the mentioned situations.

Some global tables are set to #f to reduce the amount of heap-space used. Then any toplevel assignments that precede any side-effecting expressions are collected and marked as being always bound (no bound-checks are necessary, since they are guaranteed to be bound) (scan-toplevel-assignments in optimizer.scm). Now CPS conversion takes place (perform-cps-conversion).

Here the fundamental compilation process starts: a loop performs first an analysis pass (analyze-expression), checking imports (enabled by -check-imports) if this is the first loop iteration, and then an optimization pass by calling perform-high-level-optimizations. If the latter returns true as its second result, then an optimization was triggered and the loop continues. If no optimization was triggered, and inline-substitutions (rewritings of primitive operations) have been performed yet, the enable inline-substitutions for subsequent optimization passes and the analysis-optimization loop continues. If -optimize-leaf-routines was given it is done now, preceded by yet another analysis pass (transform-direct-lambdas!) and the loop continues once more. All these passes build up a new node tree or mutate the existing one.

After the optimization passes, closure conversion is done (perform-closure-conversion). The user-post-analyis-pass may run now, if supplied. If -analyze-only was given, the compiler terminates now.

Preparation takes place, which prepares the optimized and closure converted node tree for code-generation (prepare-for-code-generation). generate-code in c-backend.scm is invoked now to emit the C code for the Scheme source that was compiled.

Cleanup stuff is done, if necessary (compiler-cleanup-hook in support.scm).

The compiler core (compiler.scm)

Nearly all global variables and constants used in the compiler are defined here. Some of these are set through the command line (in batch-driver.scm), but most are initialized at toplevel in this file. The procedure initialize-compiler initializes global tables and vectors to their default values.

Canonicalization

canonicalize-expression is the first compiler pass: it first invokes the hook ##sys#compiler-toplevel-macroexpand-hook on each toplevel expression - this is the main hook used for external macro expanders like syntax-case to expand toplevel forms completely. Before that any "pending canonicalizations" (simply a list of toplevel forms) are prepended to the proper source code, needed for extensions and compile time code that does funky stuff. Now each toplevel form is recursively processed, and (low-level) macros are expanded. Non-global identifiers are alpha-converted (renamed) and special forms are specially handled. Another notable thing is that the line-number database is updated here - the reader is invoked in a special way by the compiler to record the line-numbers of each list that begins with a symbol. After macro-expansion of a local form, each symbol-prefixed list is re-recorded with the line-number of the original form. Inline functions and constant are resolved. Foreign-function special forms are processed here too and store C code to be included by the backend and global information of foreign function wrappers that are to be generated later.

process-declaration parses declare forms and sets various global variables to reflect those declarations.

Macroexpansion

Usually, macros are expanded while walking the source s-expressions during canonicalization. Every list expression that begins with a symbol results in a lookup of the symbol in ##sys#macro-environment (a low-level hash table, see below). This hash-table maps symbols to single-argument procedures that (if the name matches) will be called with the complete form. The result of this expander procedure will be further processed (and possibly expanded again, if the result of the first expansion is another macro invocation).

As mentioned above, before a top-level form is canonicalized, ##sys#compiler-toplevel-macroexpand-hook is called for the complete toplevel form.

Line number information is stored in the line number database, yet another low-level hash table that maps symbols to a association list of s-expressions and line-numbers: reading a source expression is done in a special mode, where the reader is invoked with an infohook, a procedure that can do special processing for particular forms. The internal procedure infohoo in batch-driver.scm fills the line-number database with s-expression and the current line number. Later stages (for example when errors are detected during canonicalization) will try to find line-number information to give more meaningful error messages. When a form is macroexpanded during canonicalization (but not in ##sys#compiler-toplevel-macroexpand-hook), the new form is re-entered into the line-number database (mapped to the same line-number as the previous, unexpanded expression) (see update-line-number-database! in support.scm).

Note that this line-number mapping is only partially useful - complex macro-forms often are heavily transformed, so line-number information gets blurry after several expansions.

CPS conversion

perform-cps-conversion transforms the canonicalized code (which has already been converted into a node tree) into continuation passing style: continuations are made explicit by adding an additional argument to each procedure and procedure invocation and by inserting additional lambda nodes. The algorithm is relatively naive and taken more or less directly from Appel[2].

Analysis

analyze-expression does a static analysis on the node tree of the complete program. The information gained by recursively traversing the tree is stored in the analysis database, a hash-table that maps symbols (identifiers and lambda-ids) to a property list. Also, various global statistics are kept up to date (total program size for example). See here for a description of the properties. The exact meaning of the properties is subtle and the dependency among them subtle. After walking the node tree, the analysis database (called db from now on) is iterated over and the information collected during the walking refined by setting additional properties. The information available here is also used to trigger various warning messages. Most information here will control optimizations done at later stages. Note that this analysis pass will be performed many times.

Closure conversion

perform-closure-conversion does another general program transformation: closures are made explicit, that is, free-variable information is gathered and references to free non-global variables are converted into data-structure operations. The free variables which are refered in closed over procedures are created in explicitly managed data structures (closure records). This pass does both decorate entries in the db with additional information and rewrites the program. Closure conversion involves boxing: free variables that are assigned to are turned into 1-element data structures to allow sharing (strictly speaking this is only necessary for closures that are shared, but the analysis is not clever enough to figure this out). Boxed closure variables are referenced and assigned differently and the code is modified to invoke special compiler primitives to do these operations. Again, the algorithm to gather free-variable information is non-linear and pretty dumb.

Preparation

prepare-for-code-generation walks once again the node tree and performs various re-writings: variable references are turned into low-level forms with different forms for the different types of variable access (global or local, lexical variable access has by now already been converted into closure record operations). Allocation information is collected as well in this pass, which is quite important since the generated code must keep track of allocation counts to clean up the stack once it is exhausted. The allocation is counted in words and represents a rough estimate of the storage allocated in a procedure or whether a procedure allocated at all (this may trigger additional optimizations). This pass builds the list of procedures which are later turned into actual code - the whole program is just a list of functions with the toplevel forms outside of procedures put into a special "toplevel" procedure. The procedure list contains so-called lambda-literals, each holding quite a lot of information used by the backend. Also, the number of temporary variables needed by a generated C function is counted here and literals are specially marked.

The Optimizer (optimizer.scm)

This file contains the code to perform some high-level optimizations like inlining and rewriting complex forms into simpler or more low-level ones.

Top-level assignment simplification

scan-toplevel-assignments: All toplevel variables that are assigned before any side-effecting code is executed can be considered "safe" and will never need to be checked for being bound.

High-level-optimization

perform-high-level-optimizations does the following optimizations:

Optimize tail calls by replacing trivial continuation-procedures that only call their own continuation.

Perform beta-contraction (inline procedures called only once). This quite important since it removes many "administrative" continuation procedures introduced during CPS conversion.

Remove empty let nodes.

Evaluate constant expressions, i.e. do constant folding. This constant folding is quite general by calling eval on a form that contains only literals and known constant identifiers. Errors during evaluation are caught and trigger a compilation warning (and disable the constant folding by just compiling the expression as it is).

Substitute variables bound to constants with their value.

Remove variable-bindings which are never used (and which are not bound to side-effecting expressions).

Perform simple copy-propagation by replacing references to variables that are bound to other variables with the other one, in some cases.

Remove assignments to unused variables if the assigned value is free of side-effects and the variable is not global.

Remove unused formal parameters from known functions and change all call-sites accordingly.

Rewrite calls to standard bindings into more efficient forms ("Simplification") by consulting the rewrite-table, a table of standard and primitive system procedures.

Rewrite calls to known non-escaping procedures (procedures that are called but not returned from a function or assigned to lexical or global variables) with rest parameter to cons up the rest-list at the call-site, also: change procedure's lambda-list. This simplifies the processing and may trigger further optimizations (for example to remove the rest-parameter altogether if it is never used).

Remove calls to procedures that are declared as being "constant" (having no side effects), if their results are never used.

Perform general inlining (beta-reduction) if the called procedure is known (refers directly to a lambda or to a variable that has a known value) and "simple" (only does certain inexpensive operations and is not bigger than a certain threshold).

This procedure also invokes another optimization sub-pass called pre-optimization that simplifies if expressions that have certain known procedueres as their first argument.

Simplification

register-simplification registers simplification-rules for certain program nodes, namely those for procedure call (##core#call) and binding (let). The call simplification in turn consults the substitution-table which maps known library functions to simpler forms (or procedures testing for particular properties of the arguments to the known procedure). The let optimizations performs some more optimizations related to forms which are not directly connected to let but appear in such expressions:

Chained if expressions that do eq? comparisons on literlas are re-written to an internal switch form (that can later be turned into a real C switch). Note that the type of literal may an optimization of calls to eqv? into eq? and since case constructs expand into chained if forms, case with simple, immediate literals can often be compiled to C switch forms.

Certain nested let forms that are the result of letrec expansions are simplified.

Also, if nodes are sometimes optimized to an internal conditional operator that maps to C's ?: operator.

"letrec" optimization

reorganize-recursive-bindings performs a dependency analysis of letrec forms (actually the code that is the result of their expansion) and re-orders the bindings if the meaning is preserved to reduce interdependency (and thus reduce the number of assignments that are needed).

The substitution table

In simplify-named-call the information obtained from the substitution table (built up using rewrite) is used to run a set of hardcoded optimization rules with variable parameters, so called substitution classes. There are classes for many situations, like replacing a procedure call with a primitive operation, or folding multi-argument calls into nested two-argument calls. The classes are numbered.

Direct lambda transformation

transform-direct-lambdas! processes the node tree and walks lambdas that meet the following constraints: a) the procedure is only external (has no CPS redexes), b) all calls are either to the direct continuation or (tail-) recursive calls, c) no allocation takes place an d no rest parameter is used, d) the procedure has a known container (a named and known procedure) and all it's call-sites are known. These procedures are turned into direct lambdas: lambdas that do not need to be CPS converted. A back-transformation into direct style is performed here which generates faster code for simple expressions. This includes re-writing all call-sites since the calling conventions are differently for direct procedures.

Lambda lifting

perform-lambda-lifting! does so called lambda lifting, which means "lifting" local procedures that do not refer to any surrrounding lexical variables to top-level. If all call sites are known and the free variables are not assigned, then free variables can actually be passed as extra arguments. The algorithm used here is currently quite conservative and is not really worth all the trouble we go through. It works like this:

The set of possible liftable lambdas is generated out of stored variables in the db that refer to known procedures, together with a name->lambda map. Next a call-graph is generated from the set of liftable lambdas, also free variable information is stored in the call graph. Liftables that have free variables that are assigned to (and which itself are not refering to liftables) or which have more than a certain number of arguments are eliminated. Accessible variables for each liftable are collected into yet another a-list (map). Liftables that have unknown call-sites (call sites that itself are not contained in "known" lambdas) are eliminated, if the call-sites do not have access to all free variables of the liftable. Next liftables that call other eliminated liftables are eliminated. The latter process is performed iteratively until no more liftables can be eliminated. Extra arguments (former free variables) are now computed and all call-sites are updated accordingly. Then the remaining liftables are moved to toplevel (by modifying the program node tree).

Support routines (support.scm)

This file contains a multitude of helper procedures that process the db or node trees or s-expression trees.

C generation (c-backend.scm)

Here C code is generated from the optimized and prepared node tree, given in the list of lambda literals and decorated with all sorts of information to ease this process. generate-code does most of the work by once again walking the body (node-tree) of each lambda-literal and generating one or more C functions from the code. The previous passes have transformed the code into a relatively low-level form which makes the code generation quite straightforward, even if tedious. Complication arise only due to specific optimizations - here the introduction of more backend-specific special forms would have helped. Generation of code is split into sections, for code, trampolines, declarations, function prototypes, code included verbatim, prototypes for callbacks (externals) and the ptable which maps function id's to code-pointers (used for serialization/deserialization).

The toplevel lambda is treated specially, as it must create all non-immediate literal data and define global variables (which are first class in CHICKEN).

The second job is generating the stubs and wrappers for foreign procedures, which can get quite involved as argument-translation form Scheme to C and back is partially done here (whatever is done on the C level - procedures in support.scm add yet another layer of transformation on the Scheme level).

C-specific parameters and substitution rules (c-platform.scm)

This file contains a bunch of constants and parameter values for various uses, for example a list of valid command-line options and the names of builtin procedures that have specific properties that can be used during optimization. In addition, the rewrite-rules for substitution (rewriting calls to known standard and non-standard library procedures to simpler or more efficient forms) are defined here, by invoking rewrite with the substitution class and class-specific parameters (the number of required arguments, is the rewriting only valid in unsafe mode, etc.).

The name of the file is somewhat misleading, as it was once intended to put all C-backend specific things into one file to ease the porting of the compiler to other backends. Since then all sorts of global parameters have moved in here.

Other Files

tweaks.scm

This file contains inline-definitions for node-access operations to speed up the frequent use of taking apart the node tree.

version.scm

The current version is contained here as a constant definition.

banner.scm

The banner shown when the -version option is given and the copyright notice, also as constants.

chicken-more-macros.scm

All non-standard macros are defined in this file.

chicken-ffi-macros.scm

All macros for interfacing to foreign code. The interpreter doesn't need FFI macros, so these are put into a separate file (instead of merging them into chicken-more-macros.scm). These files are included in the compiler and run-time-macros is declared, so the expanders are compiled into the executable.

The analysis database

During analysis a database is filled with information about global and local variables. The database is passed from one compiler pass to the next and completely rebuilt during each optimization iteration (and a few times more). The internal format is that of a hash table (a low-level hash table specialized for symbols), which is accessible using the primitives ##sys#hash-table-ref, ##sys#hash-table-set and ##sys#hash-table-for-each. In this hash table symbols are mapped to association lists that themselves map property indicators (symbols) to values of various types. The compiler options -debug 0, -debug 4 and -debug 8 show the most interesting entries in the db at various points in the compilation process, in a compressed format. See display-analysis-database in support.scm for more details.

The node tree

During analysis and optimization, the program is represented as an abstract syntax tree consisting of node records (see support.scm). Note that the accessors for node slots are not qualified with a ##compiler# prefix (originally for making it more convenient for user passes, but in the end probably a mistake, since it is easy for compiler-extensions or code loaded at compile-time to overwrite these but since the accessors are inlined (tweaks.scm), this is a minor problem). Each node has three slots: the class (a symbol like if, set! or lambda), the parameters (a list of arbitrary values, holding parameters that are specific to a node's class) and the subnodes, a list of nodes that hold nested program fragments (for example the body of a lambda or an application).