Porting CHICKEN 5 code to CHICKEN 6

  1. Porting CHICKEN 5 code to CHICKEN 6
    1. Bootstrapping
    2. R7RS support
      1. Read-syntax changes
      2. R7RS-specific extensions to existing operations
    3. UNICODE
      1. Blobs vs bytevectors
      2. Port encodings
      3. Requiring bytevectors
    4. Moved or renamed core procedures
    5. Removed procedures that were deprecated
    6. New core procedures
    7. New core modules
    8. Generative record types
    9. Module system changes
    10. FFI changes
    11. Tool changes
    12. Other changes
    13. Eggs
      1. Porting other author's eggs

This guide will explain how to port code from CHICKEN 5 to CHICKEN 6. The latter is currently still under development, but stable enough to start porting eggs. Furthermore, the exposed programming interface and procedure signatures can be considered fixed, all additional work is very likely to involve bug fixing and performance improvements that shouldn't change the user-visible part of the system.

To see a comprehensive list of changes made, read the NEWS file in the CHICKEN source distribution. The following tries to explain the most important changes in a bit more detail.

Bootstrapping

To bootstrap this version, it is preferrable to use the distribution tarball available here: chicken-6.0.0-bootstrap.tar.gz (SHA1: 8eee7e1fa12bb70932c7f453714855fd4aff724b).

The simplest method is to run the scripts/bootstrap.sh script, which downloads the tarball mentioned above and builds a "boot-chicken" that can the be used to build the final version.

It is also possible to build from git, using several bootstrapping steps, as the compiler needs certain features to be able to build the next newer version. This process is somewhat involved, but shown here for completeness:

The reason this is currently so complicated is that there are two internal changes that influence the generated code: the internal string representation (for UTF) and the movement of certain library procedures to different modules. These changes require to build chicken compilers with successively added extensions and modifications, otherwise you will end up with an unbuildable system.

R7RS support

CHICKEN 6 now fully supports the R7RS (small) Scheme standard. The default language available when starting the interpreter or in toplevel compiled code still remains R5RS Scheme, but the syntax and the available core modules are fully compatible to R7RS. A few of those changes are unfortunately not backwards compatible so please read the following carefully to minimize porting problems.

The old r7rs egg has been fully integrated into the core system, including all the procedures, macros and modules that the egg formerly provided. This implies that define-library is a valid toplevel form.

Most incompatibilities result in error messages and so are easy to localize, but in a few places the differences are more subtle and will be described below.

Read-syntax changes

In R7RS hexadecimal escsape sequences in string literals must be terminated by a semicolon (";"). CHICKEN normally expected 2 hex digits, so the missing terminator will result in an error. As a slight help, the hex sequence is assumed to be terminated by the first non hex-digit character, but will still produce a warning.

For code that is intended to be backwards-compatible to CHICKEN 5, consider using the "\uXXXX" sequence, which expects 4 hex digits and does not use a terminator.

The read-syntax for case-sensitivity ("#ci" and "#cs") have been removed, use the R7RS markers #!fold-case and #!no-fold-case instead.

R7RS-specific extensions to existing operations

To conform to R7RS the following procedures have been extended in functionality:

UNICODE

CHICKEN 6 fully supports the UNICODE character set in all strings and symbols. Source code may contain non-ASCII identifier characters if they belong to the "letter" character class. Internally, UTF-8 encoding is used.

Strings passed to foreign code and to primitives that themselves involve system calls are UTF-8 encoded. Invalid UTF-8 sequences are silently preserved and so can be transparently received and passed on. On Windows the "wide-character" variants of OS API calls are used, strings are converted to and from UTF-16 on the fly in this case.

Blobs vs bytevectors

The type blob has been renamed to bytevector. What used to be "blobs" in CHICKEN 5 are now R7RS bytevectors. Similarly the blob foreign type is now named bytevector, but the old name is deprecated but still available to make porting code a bit less troublesome.

The (chicken blob) module has been renamed to (chicken bytevector). A "shim" module is provided as an egg which simply reexports the bytevector operations under the old names.

The "#$..." blob read syntax has been removed. Use "#u8..." instead, which supports both vector- and string syntax (compatible to SRFI-207).

Port encodings

Ports have a new property identifying the encoding used to read and write text to the underlying file. The procedures open-input-file, open-output-file, call-with-input-file, call-with-output-file, with-input-from-file and with-output-to-file, which already accept optional arguments specifying whether I/O should be in binary or text mode, now also respect encoding specifiers that determine what encoding to use. Currently supported are #:utf-8 and #:latin-1. Internal mechanisms to extend the set of supported encodings exist but are currently not exposed.

The current port-encoding can be accessed using the port-encoding procedure from the (chicken port) module. A SRFI-17 setter can be used to change the encoding. Note that changing the encoding while the port is accessed may result in unexpected consequences with buffered (peeked) characters.

Note that the R7RS string and bytevector I/O procedures allow accessing arbitrary ports, regardless of their encoding. Other R7RS implementations may forbid this.

Requiring bytevectors

The following procedures require bytevector buffers now:

In both cases, using bytevectors makes more sense, as the operations either expect raw bytes or because the access files at the lowest level.

Moved or renamed core procedures

The syntax-error procedure is now a macro and exported by (scheme base). As the macro signals an error at compile (expansion) time, existing uses that still assume syntax-error is a procedure will trigger errors if the expanded code wants to signal at execution time.

The vector-copy! procedure from (chicken base) had a different signature from the R7RS version. The incompatible definition has been removed, and now only the standards-compliant version is exported from (scheme base).

Removed procedures that were deprecated

New core procedures

The parameter include-path is available from the (chicken platform) module.

New core modules

The functionality of the srfi-4 module has been moved to (chicken number-vector), which has been extended to support 64 and 128-bit complex number vectors. The srfi-4 module is still available, reexporting the relevant identifiers from (chicken number-vectors).

Generative record types

Record structures defined with define-record-type are now "generative", as required by R7RS. That means that every evaluation of the form produces a new record type, distinct from previous existing types of the same name. Since this means that record-type identifiers have been extended to non-symbols, the name alone is not sufficient to mark a specific instantiation of the record type. To actually access the type identifier (for example when using it to define a printer with set-record-printer!), you should pass the type-identifier as assigned to the variable of the same name as the defined type:

(define-record-type point (make-point x y) point? ...)

;; "point" holds the type-identifier

(set-record-printer! point (lambda ...))

define-record still defines non-generative records types.

Module system changes

The "module aliases" srfi-0, sfi-6, srfi-9, srfi-11, srfi-23, srfi-39 and srfi-98 have been removed.

The modules r4rs, r5rs, r4rs-null and r5rs-null are now accessible under the "scheme" namespace as (scheme ...).

The module scheme is an alias for {scheme r5rs) to make porting existing code less painful to port. When you port some code and get errors for accesses to identifiers like open-input-string or make-parameter, adding an import of (scheme base) is usually enough. New code should instead replace imports of scheme with (scheme base) and add imports for the functionality that (chicken base) normally provided and which R7RS now makes available in one of the standard (scheme ...) modules.

FFI changes

Strings and symbols are not copied when passed to foreign code. The change in the internal representation automatically takes care of the trailing \0 character, so the copying is not necessary anymore. Naturally any destructive modification of a string by foreign code is something you want to avoid.

Tool changes

The command line option -r5rs-syntax has been replaced by -r7rs-syntax, disabling non-R7RS read syntax.

Other changes

Locatives (from the (chicken locative) module) on strings index by code-point, not by byte.

The symbol-escape parameter from (chicken base) now also controls whether symbols containing special characters are printed using the |...| escape syntax or not.

The record-instance? and make-record-instance procedures from the (chicken memory representation) module now accept non-symbolic type identifiers to support generative record types.

The platform identifier for native Windows builds has been renamed from mingw32 to mingw.

The set-describer! procedure from (chicken csi) has been removed.

Eggs

Eggs that have already been ported to CHICKEN 6 will have changed to reflect the new features and changed semantics. Consult the relevant wiki pages for more information.

Porting other author's eggs

If you want to port old eggs someone else made to a new CHICKEN version, follow the steps above for the porting itself. Other than that, there are two things you should regard.

First, you can find the repositories of CHICKEN eggs here:

Second, be sure to check back with the original author (their email address is usually in the metadata files of the egg) so they're aware of and okay with you porting their code. And finally, to get the egg listed in the index, announce it on the mailing list.