CHICKEN 5 roadmap

  1. CHICKEN 5 roadmap
    1. Modularising the compiler [done]
    2. Reworking the core modules ("units")
      1. Replacing SRFI-14 with cset implementation from irregex? [irrelevant]
      2. Refactoring the CHICKEN test suite to use a core library? [status uncertain]
      3. Proposed libraries [incomplete]
      4. Proposed removal from core
        1. SRFIs [done]
        2. queue datatype (data-structures), binary-search (data-structures), mmapped files (posix), object-evict (lolevel) [done]
        3. combinators [status uncertain]
        4. Various ill-conceived POSIX things [status uncertain]
        5. Better API for continuations [status uncertain]
    3. Reworking the way libraries are loaded [incomplete]
      1. Make the library load path a search path [incomplete]
      2. Standardize import path behaviour
    4. Refactoring the scheduler [incomplete]
    5. Refactoring the I/O (ports) system [incomplete]
    6. Integrating the full numeric tower [done]
    7. String encoding [status uncertain]
      1. Reject all NUL bytes
      2. Unicode
    8. Improve the egg system [incomplete]
    9. Changes to set! [incomplete]
      1. Make set!'ing of unbound variables an error
      2. Make set!-ing of module-defined identifiers an error
    10. Determine how to make CHICKEN 4 eggs live alongside CHICKEN 5 eggs [incomplete]
      1. The simplest approach: just carry on
      2. Rework each egg's release namespace
    11. Check if it is possible to have both CHICKEN 4 and CHICKEN 5 installed system-wide [incomplete]

Here's a proposed list of things we would like to see in CHICKEN 5. Feel free to add more details if you know of a way to implement something or have an idea how to improve some part. Please, no editing flamewars here!

Modularising the compiler [done]

This work has been completed: the compiler now is composed of modules in the chicken-5 branch (prefixed with chicken.compiler), but the following "nice to haves" are not yet implemented:

These should be considered after CHICKEN 5 is released. Of course, if you want to tackle one of these before, feel free to submit a patch.

Reworking the core modules ("units")

Right now the modules supplied by core are somewhat arbitrarily named, and too many unrelated things are grouped together. We should go through the system and look at what we have, then make logical names. Suggestion appear later on this page, for further discussion. We should attempt to align it with the r7rs naming conventions, to make things easy for that egg, and for people new to CHICKEN but familiar with other r7rs implementations. This probably means "scheme" should be renamed and split up to "scheme.base", "scheme.load", etc. A possible generalisation (or "convenience hack") could be to define the "scheme" module to import all of the underlying submodules.

Replacing SRFI-14 with cset implementation from irregex? [irrelevant]

This has been discussed ages ago. It might be more memory-friendly and performant. One problem with the current SRFI-14 module is that it assumes Latin1 encoding (and therefore can only handle 256 different characters), whereas most other CHICKEN components and eggs assume UTF-8.

This is not needed, because SRFI-14 is no longer part of core. The egg could still benefit from it, but it's not something that will hold up the CHICKEN 5 release.

Refactoring the CHICKEN test suite to use a core library? [status uncertain]

As we remove a lot of cruft from core which it doesn't need, it may be a good idea to add some things that we do need. Like the test egg: there is a lot of macro code duplication in core's test suite. It's probably better to ship a well-designed testing library with core, which core itself can also use. This would make it easier, if we decide to do this later, to format test output on Salmonella in a consistent manner for both core and eggs.

Proposed libraries [incomplete]

Refer to the concrete reorganization plan here.

What will we do with the SRFIs we implement? It would make sense to define the following, but it would be tedious to import all these:

srfi-2
and-let*
srfi-8
receive
srfi-31
rec
srfi-26
cut, cute
srfi-17
setter, getter-with-setter
srfi-10
define-reader-ctor
srfi-39
parameter objects

Also, is it srfi-2 or srfi.2? The latter would match up with (srfi 2) usage which is reserved by R7RS for SRFIs.

Proposed removal from core

The list below is just one hacker's idea of what could go. Please add more.

SRFIs [done]

SRFI-1, SRFI-13, SRFI-14, SRFI-18 might be removed. SRFI-69 will be removed, as discussed in CR #1142.

As pointed out several times by John Cowan, SRFI-15 (fluid-let) is unsafe in the presence of threads, and any use is most likely broken and should be replaced with R7RS/SRFI-39 parameters. Currently, core uses it in a few places, in a possibly dangerous way.

Most importantly, there is no reason it has to be in core, because it uses only basic primitives. I think it's best to delegate it to an egg.

queue datatype (data-structures), binary-search (data-structures), mmapped files (posix), object-evict (lolevel) [done]

Proposal already accepted in CR #1142.

combinators [status uncertain]

Some of the combinators from data-structures are very nice, but there only a handful of them are actually useful. There is no technical reason to keep them in core, they might fit better in an egg.

Various ill-conceived POSIX things [status uncertain]

These things I don't like, but doesn't mean it *has* to go. It may always be put in an egg of course.

Better API for continuations [status uncertain]

Nobody seems to use the "better API for continuations" by Feeley: continuation-graft, continuation-capture, continuation-return, continuation?

If it doesn't benefit anyone (core doesn't use it, only two eggs do: shift-reset and continuations), it can be taken out. It might be put into an egg.

Reworking the way libraries are loaded [incomplete]

Right now there are just too many confusing things, like require, require-extension, use, import, load, load-library, require-library.

Units and modules are confusing also. This could just be a documentation issue.

Make the library load path a search path [incomplete]

This keeps cropping up on IRC: people expect to be able to load libraries from their eggs using a search path containing multiple entries. This would allow you to (use ...) a module from your application without installing it as an egg.

This is rather tricky: what happens when you compile it and install the whole program into some other location? Also, changing the way it's implemented is nontrivial, as it has been attempted before (see #736).

Standardize import path behaviour

Currently, import files are loaded from a different conceptual path than extensions, which use a different path than include files, and so on. We should standardize this behaviour, and allow the user to use multiple directories as the path.

It would also be nice for include to push the including file's directory onto the include path during expansion.

Refactoring the scheduler [incomplete]

One missing ability in the scheduler is for threads to block on more than one object. This would allow us to generalise file-select to ports.

Refactoring the I/O (ports) system [incomplete]

Currently, ports are somewhat ill-defined: they're a hand-coded record type with a bunch of slots, with comments indicating which slot is used for what. It would be cleaner and easier to understand the code if this was changed to a "proper" record type.

The current-*-port identifiers should be rewritten to be proper parameters instead of fake ones which are rebound through fluid-let.

Recently I discovered that set-file-position! does not work on string ports. Port position should be part of the official interface, so that this is extensible, and if a port implements it, it can be rewound. This makes sense at least for file-backed ports and string ports.

This is also a good opportunity to look at why I/O is so slow.

One small improvement I'd like to make is to change write-string to accept an offset into the string from which to write. This would mean writing substrings does not have the overhead of first having to copy the substring to a new string and then writing it. I ran into this once and I thought it was a shame, because it's such a trivial (but incompatible) modification.

Integrating the full numeric tower [done]

This work has been completed: full support for the complete numeric tower is available in the chicken-5 branch. This includes support for literals in compiled code as well as full integration with the FFI.

String encoding [status uncertain]

Reject all NUL bytes

If we reject all NUL bytes inside strings, we can encode strings more conveniently by adding a NUL terminator to all strings (nothing else changes). If we do this, the FFI does not need to copy strings, which makes it much more lightweight.

Things to look into:

Unicode

This at least needs some additional thought. Do we want to make UTF-8 the "official" encoding? If so, ideally, all string operations should reject invalidly encoded byte sequences (should we still allow NUL bytes to be represented?). What to do with the Unicode case folding lookup tables, string-ref?

If we go full Unicode, the SRFI-4/blob types might need some attention, because strings can no longer be (ab)used as byte vectors.

Improve the egg system [incomplete]

Since this is a rather comprehensive point, there is now a separate document for it.

Changes to set! [incomplete]

Make set!'ing of unbound variables an error

R7RS recommends making this an error for modules but allowing it in the REPL.

Make set!-ing of module-defined identifiers an error

Make identifiers imported from modules un-set!-able, for both core and user-defined modules. set! on such identifiers should raise an error, whereas define should define a new variable (in the current module's namespace, if there is one).

Determine how to make CHICKEN 4 eggs live alongside CHICKEN 5 eggs [incomplete]

Currently, "THE SYSTEM" does not have any special considerations for the major CHICKEN release used. This could be considered an oversight. To make it possible to continue using CHICKEN 4 eggs while CHICKEN 5 is being developed and matured, there needs to be some sort of way to do this.

Currently, we have the master list of available eggs, which lives in the svn repo. THE SYSTEM is extremely simple and doesn't really care much about how eggs are supplied, so we could just fire up a second instance of henrietta-cache which fetches from a different master list containing the CHICKEN 5 eggs. However, what can we do to make life easier for egg maintainers?

The official CHICKEN egg repo (SVN) already has taken care of this due to the /release/N namespacing. The thing that needs to be changed is the location of the henrietta CGI, to include a version number, or we could add an extra URL parameter and teach it about the versions.

For user repos, a simple way is to simply start a second repository and call it a day. However, this will probably result in awkward names. Making a new branch results in the same problem: the master branch would correspond to an outdated release!

The simplest approach: just carry on

Just continuing in the old repository for each egg is possible, if no new releases need to be tagged for the old CHICKEN release. This mostly precludes emergency bugfix releases, but these could be continued on a different branch (release-info only takes into account tarballs which get generated from a tag name, after all!).

To prevent version tag clashes, the egg's major version should be bumped for CHICKEN 5. Let's take for example an egg which has released 1.0, 1.2 and 1.3 for CHICKEN 4. If we bump the major version, we can release 2.0, 2.1, etc for CHICKEN 5. If an important bugfix needs to be made for the CHICKEN 4 version, we can continue with 1.4. If we don't bump the major version, the egg would be forced to use micro version numbers for those, like 1.3.1. Both approaches are fine, depending on how much effort is expected to be put into the "old" branch.

The old release-info file will be untouched and continue to be used by the CHICKEN 4 version of Henrietta-cache. For CHICKEN 5, a new file is made (ie myegg.chicken-5.release-info) which starts out empty, and as new releases are made will continue with the number where the CHICKEN 4 branch left off.

Rework each egg's release namespace

Another, possibly cleaner, approach is the following:

This way, new eggs and old eggs will always have the master branch point to the active version. It does mean a little bit more work on every major release.

To avoid having to clear the release-info file every time, we could also extend it to include a major release version number (and if it's missing, assume "4"?). This means the release-info file would list both CHICKEN 4 and CHICKEN 5 (and later CHICKEN 6) releases in the same file. This might make maintenance a little easier, but requires a small change in henrietta-cache.

Check if it is possible to have both CHICKEN 4 and CHICKEN 5 installed system-wide [incomplete]

Maybe the current build system allows that (better check). During the transition period, it would be nice to allow users (and packagers!) to have both CHICKEN 4 and CHICKEN 5 installed system-wide on the same system.

Basically, we'd need to take the major version into account when naming things (binaries, runtime library and local egg repo). E.g., csc5, csi5, libchicken5.so.<binversion>, lib/chicken5/<binversion>.

The hardcoded path part (lib/chicken) in ##sys#repository-path makes me (mario) believe it's not possible, but I may be missing something.