The CHICKEN Contribution Guide

by Felix Winkelmann

This document is an attempt to describe the various components of the CHICKEN Scheme ecosystem, with a view on where inclined users can contribute and help us to improve the system.

As I have worked on this project for more than 25 years, my opinions are naturally biased, but I will try nevertheless to give an objective assessment of the scope and complexity of the many parts of CHICKEN.

If you want to help us make CHICKEN better and more usable for others, then your efforts will be greatly appreciated, as we hope to maintain, improve and extend the system further. Hacking on CHICKEN should also be fun, so if you want to help, consider something that you enjoy or that you are interested in.

The codebase has seen many years of maintenance and constant improvements, which may reflect in a certain "messiness" of the sources in some places. We are fully aware of this, but fixing user's problems always had priority before making the code a shining example of software engineering excellence.

Once you study the CHICKEN sources, you will doubtlessly have many questions, so feel free to ask - remember, there are no stupid questions. The most direct way is to pop in at the #chicken IRC channel at irc.libera.chat. There is always someone around that can help you, but if your question isn't addressed instantly, wait a couple of hours, someone will respond.

For more elaborate questions or clarifications, consider subscribing to the chicken-hackers mailing list, where the core developers and everybody interested in the maintenance of CHICKEN exchange information.

The most active core maintainers are currently Peter, Mario and me, we can answer most questions regarding the code and infrastructure, so don't be afraid to approach us. We all have day jobs and may be busy, so if you don't hear from us right away, have patience.

The development philosophy we try to follow is to keep things minimal, simple and dependency free. This is not always obtained, yet we try our best (and fail sometimes). If additional features can be moved into separate extensions, then one should do so to keep the core system lean.

Finally, if you have ideas and suggestions for certain areas that in your mind need improvement, feel free to add them here, perhaps after discussing the idea with the maintainers.

Eggs

The easiest way to contribute to CHICKEN is to write extension libraries ("eggs") and enlarge and/or improve our ever-growing collection of bindings and utilities, see eggs tutorial for more information. You can of course also assist egg maintainers in improving or porting their extensions. If you do so, make sure to contact the author or maintainer of the egg before suggesting changes.

Eggs are hosted in a distributed fashion. To contribute patch to eggs, you might want to check where eggs are hosted, and based on that infer how to send patches. E.g., for eggs hosted on sites which work with the pull request approach, like Codeberg, GitHub, GitLab etc, you can probably submit pull requests. Usually the location of egg source repositories is referenced in the documentation of eggs. If that is not the case, you can reach out to the author of eggs to update the documentation and piggyback a question on how patches are supposed to be submitted.

Documentation

Documentation is an area that is in constant need of contributions. Here contributions do not necessarily require deep understanding of CHICKEN or even Scheme: there is a lot of improvements that can be done in terms of organization, consistency, spelling and wording.

Historically, documentation of the core system and the extension libraries (eggs) is held in a wiki at wiki.call-cc.org, where the CHICKEN core and eggs live in different areas, organized by CHICKEN major version: https://wiki.call-cc.org/man/<major-version> and https://wiki.call-cc.org/eggref/<major-version>/<egg> respectively.

If you want to make larger changes or if you are unsure about something, contact us (for CHICKEN core's manual contributions), or the maintainer of the egg (if you refer to an egg's documentation) first.

The recommendation for patches to the CHICKEN core manual pages is to send patches to the chicken-hackers mailing list. See also the "Guidelines for patches targeting the chicken-core repository" section in this document.

Infrastructure

One important but less visible aspect of CHICKEN is the infrastructure to make eggs available to users, utilizing a distributed approach to collecting egg sources and serving them on one of our egg mirrors, called "THE SYSTEM". If you want to help adding features or improving the service, contact us and ask for more information. Most of the code here is written in Scheme and shell scripts.

Serving eggs is done via the henrietta and henrietta-cache eggs. Providing an additional mirror site for our eggs might be a possible way of supporting us, but that necessarily means a bit of maintenance work.

We use our own wiki engine and utilize Trac for bug-tracking, which served us relatively well. Recently Trac turned out to be quite resource intensive due to excessible web crawler access and offline use is not possible, so there might be room for improvement or even a completely new approach.

Core System

The compiler, interpreter, egg tools and libraries making up the base CHICKEN distribution is what we call the "core" and is mostly written in Scheme with some parts in C. The core system is made up of various parts, which I will describe in the following sections.

Our bug tracker may be a good start to take a look at open issues, but it is not too well maintained, so some tickets may be out of date or obsolete.

Compiler

This is the oldest part of the system and translates Scheme to C, following Henry Baker's "Cheney on the MTA" compilation strategy. A number of passes analyze, transform, optimize and translate the code, generating C in the final stages.

The basic structure is relatively straightforward, analysis passes are followed by transformation phases, done repeatedly until nothing more can be found to be improved. An internal node tree holds expressions and is partially transformed in passes that optimize and simplify the currently compiled program. A final pass converts the node tree into C code fragments and writes them into the output file. The csc compiler driver invokes the base compile (chicken) and invokes the C compiler and linker, if necessary.

Optimizer

Optimization passes in the compiler take the results of the analysis phases and perform certain simplifications and rewrites of the internal node tree used in the compiler. There are of course always areas where this could be improved, even though most optimizations that can be done with a reasonable amount of effort have been implemented (inlining, dead code elimination, some strength-reduction). Additional transformations have come and gone, since there always has to be maintained a balance between compile-time overhead, code complexity and the perceived results.

Adding more optimization passes is easy, once the basic structure and the shape of the node tree and analysis database is understood. As the code is transformed to continuation passing style and uses a somewhat original translation strategy, it puts certain constraints on the shape of the code in terms of allocation and control flow. When adding more optimization some prudence is required to balance compile-time, memory overhead and compiler complexity in an acceptable manner.

Interpreter

The interpreter ("csi") provides interactive dynamic evaluation of Scheme code, using the evaluation machinery available in the runtime system, which can also be used in compiled code. The evaluator translates Scheme expressions into a closure tree using a technique invented by Guy Lapalme and Marc Feeley. This is straightforward to implement in Scheme, integrates easily, is compact and efficient enough for most uses, but can of course not compete with compiled code in terms of performance.

JIT compilation is currently not planned, as it just adds large amount of target-specific code and compilation to C is already available to obtain results that are likely to be more efficient (and allows easy access to native code and operating system services).

Syntax expander and Module system

This part expands syntax ("macros") and resolves modules and is one of the most complex subsystems of CHICKEN. It evolved steadily from a low-level Lisp-like namespacing system and procedural ("defmacro") expander, via first attempts at hygienic macros into a feature rich module system and hygienic macro expander supporting syntax-rules and procedure explicit- and implicit renaming macros.

The original namespacing method used symbols as first class objects, holding name and value slots (as in classical Lisp). This hasn't changed, as it is simple to implement and allows attaching data to identifiers (like property lists), passing them around as efficient placeholders. The module ystem on the other hand, "overlays" (so to speak) a namespace over this, kept in something called a "syntactic environment" that maps identifiers (symbols) to other symbols holding the actual values, the first identifiers being used just for their name.

The module system started from one similar to the Chez (Waddel/Dybvig) module system and has now been extended to incorporate features needed by R7RS libraries.

Hygienic macros have their own syntactic environments to ensure their expansion does not break namespace barriers and are, indeed, hygienic.

This code desperately needs cleanup, but requires deep knowledge of the whole system so any improvements in this area are probably not forseen in the near future.

Scrutinizer

The "scrutinizer" performs a local leightweight flow analysis of code at the early stages of the compilation to help catching type errors and assist in some basic type-based rewriting of the code.

The analyzer started out simple but has become exceedingly complex, up to a point that makes me worry. The optimizations performed by the extended type analysis are worthwhile, though and somewhat overlap with optimizations done by other stages of the compiler. Also, the reporting has become overly verbose, bordering on the confusing, so this part could be tweaked somewhat. Another issue of interest would be to compare optimizations done by the scrutinizer (and encoded in the types.db table) with those done by the optimizer (c-platform.scm), perhaps some of the latter could be removed if they duplicate rewrites done by the former. The scrutinizer was introduced much later, so redundancies may lurk here.

Backend

The backend translates an intermediate node tree created during compilation into C. Other targes are currently not planned. I did start with a port to 'QBE' but got distracted and I am now not so sure how much this is worth the effort, as C compilers optimize more aggressively and C is syntactically richer and more flexible, making the backend translation easier.

The backend (c-backend.scm) could doubtlessly benefit from some cleaning up, as it intertwines a lot of target-specific logic with C text generation, but since there is only one target language, the effort needed and the risk of breaking the code generation in subtle ways should be considered when touching this.

Runtime Library

The runtime library, written mostly in Scheme, with some parts in C or a mix of both, holds all the standard and non-standard library functions available in compiled or interpreted CHICKEN code. The library is used by the compiler and tools themselves as most of CHICKEN is written in itself.

Over time, the structure of the library sources has deteriorated somewhat, as adding modules has resulted in the mapping of library routines and the modules that export them to be intertwined heavily (just look at library.scm to see what I mean). Also the fact that CHICKEN is written in CHICKEN introduces subtle points of breakage when the compiler uses improved functionality of the runtime system. In that case, one needs to add features stepwise, always making sure a compiler (the chicken binary, mostly) is buildable to compile the new, improved library.

One area that needs improvement is the naming of low-level C routines, which is currently rather inconsistent. The general idea is to have safe variants that perform runtime checks and unsafe variants that are then used by the safe versions (e.g. C_i_car() vs C_u_i_car() for the unsafe variant). Optimizations can then transform accesses from safe to unsafe variants, consistently. Again, bootsrapping issues can come up, if compiled code in the compiler uses variants that have been replaced or removed.

The library has some notable subsystems that warrant their own descriptions which follow now.

Garbage Collection

The compilation strategy naturally provides a multi-generation garbage collection scheme that works quite reliably, after being used for so many years. The code is hairy and could be cleaner, but there is no intent to change anything here, as it is so central to code execution and so important to work flawlessly under various memory loads. Non-essential features like dynamic adaption of memory areas have added some complexity, the effectiveness of which are not necessarily clear, though. Some evaluation and performance testing in this area might be useful, yet the performance is so dependent on work loads and timing that it is next to impossible to make general assumptions about the perfect settings of all the knobs that control the collector (like minimum/maximum heap size, nursery size, etc.)

I/O subsystem

Handles file and socket I/O. This is basically straightforward, but has subtle and complex interactions with the scheduler and the encoding/decoding layers of the UNICODE support implemented in CHICKEN 6, described below. For the support of variable encodings recent changes to the code for port I/O were made in a somewhat haphazard way, so might benefit from cleanups or a little bit more abstraction.

Scheduler

The scheduler manages the user-level ("green") threads offered in CHICKEN. We currently use a poll loop to multiplex blocking I/O operations, which is portable and effective but signal handling and other asynchronous events make this quite involved. Using a more capable mechanism like kqueue or epoll/io_uring would be much of an improvement but requires (again) some deep knowledge of the system.

There are known bugs (very subtle ones), and the scheduler was more than once a matter of discussion among the core maintainers, as our confidence in its reliability is not too high. Rewriting this would be a daunting, yet highly desirable task.

Support for native preemptive threads is currently not planned, as this introduces an enormous amount of complexity, non-determinism and possible performance bottlenecks, due to the nature of the code generation strategy and how garbage collection is performed.

Numeric tower

The numeric tower provides the code to seamlessly perform arithmetic on numbers of various types and handle the conversion between each type.

This part was mostly implemented by Peter Bex and works quite well, using sophisticated algorithms for bignum arithmetic. Peformance analysis of the code would be interesting but currently we are not in need of any improvements in this area.

Floating-point parsing and printing

Currently we use an ad-hoc approach for parsing and printing floating point numbers, using libc functions with some extra handling of special cases to ensure R7RS compliance. A worthwhile project would be to use a clean, efficient floating-point <-> string conversion algorithm (like, for example, seen here), which would give more control of standards adherence and performance. This would also avoid implementation or platform dependent quirks of the underlying C library.

UNICODE

The unicode subsystem handles UNICODE-aware character and string processing. Support for port-specific codecs has been added as well.

With major version 6, CHICKEN now supports full UNICODE strings and transparent access to a string's constituent characters. Interfaces to the operating system need to be aware of the encoding used (UTF-8) in some places. Escpecially on Windows things quickly get complicated as the wide-character API expects UTF-16 encoding.

UNICODE support is rather new, is likely to contain bugs and inefficiencies. Also, more advanced features like grapheme clusters are not directly supported, but could be provided by extension libraries. Also, this part needs more testing and performance analysis, as all of this is new and may introduce bottlenecks and/or unexpected performance traps.

The Foreign Function Interface

Since CHICKEN produces C, interfacing to native code is straightforward, the FFI allows bindings to C functions, variables and values in various ways.

There is not much that could be improved, since the C compiler manages all the icky details like struct layout and alignment, calling conventions, etc. One area that might be explored more is interfacing to native threads, the concurrent-native-callbacks egg is one possible approach. Note that things like callbacks require some knowledge of the compilation and code generation strategy used, as the use of the C stack is somewhat unorthodox in this system.

Egg tools

The egg tools (chicken-[un]install and chicken-status) provide the build, installation and maintenance of installed extension libraries.

With CHICKEN 5, this has turned out to be quite workable. One are of improvement would be parallel builds of eggs, or enhancing the .egg description language for more complex build tasks. On the other hand, a simple and (mostly) declarative format like the one we have is easy to verify and avoids (many) security holes which a procedural approach (like the one used before CHICKEN 5) opened up.

The installation of extensions provides only a limited choice for placing and locating library code. Environment variables and command-line options allow extensive customization, but this is often confusing and has limitations. There are several external tools to manage multiple installations and parallel install and load locations, which shows that the design space is large and users require a great deal of flexibility in this regard.

Build System

The build system creates a CHICKEN installation from sources, either from precompiled C files or from the development repository.

Currently we depend on GNU make, due to some GNU-specific expressions used. Having a makefile that works with BSD makes would be very desirable and making the whole build simpler would be as well, as we have in our perception reached a level of gnarlyness that warrants a re-think of the current approach).

One suggestion was to have a "schemeish" (s-expr) description of the build and generate a portable makefile from that, which can then be shipped with the distribution tarball of CHICKEN. There is some parallel to .egg files and generated build/install scripts, which may give some inspiration.

The build has subtle requirements like boostrapping, static builds and the circular nature of builds from Scheme sources vs builds from a distribution tarball, containing precompiled C files.

We do not plan to support any other external build tools than make(1).

Guidelines for patches targeting the chicken-core repository

Backwards compatibility

Changes within a major version of CHICKEN are not supposed to break backwards compatibility.

Backwards incompatible changes require an increment of the major version of CHICKEN, which in turn requires porting eggs, so that's not something done often (normally once in a number of years -- after more than 25 years CHICKEN is at major version 6).

Testing changes before submission

CHICKEN 6 provides a script scripts/smoke-test.sh that can be used to verify that changes don't break basic functionality of the CHICKEN core (e.g., building, testing, installing eggs). It is recommended to run this script before submitting changes to the mailing list. For more details, check its help message by executing

 $ ./scripts/smoke-test.sh -h

Submitting patches

See the Submit patches section in the Contribute page.