CHICKEN 5 roadmap (historical revision 31440)

You are looking at historical revision 31440 of this page. It may differ significantly from its current revision.

CHICKEN 5 roadmap

CHICKEN 5 roadmap

Here's a proposed list of things we would like to see in CHICKEN 5. Feel free to add more details if you know of a way to implement something or have an idea how to improve some part. Please, no editing flamewars here!

Modularising the compiler

There's a preliminary version of this in the compiler-modules branch. This is just an ugly first step to get the ball rolling. What remains to be done:

Define an "official API" for users of the compiler. Basically everything that's currently being done through ugly ##compiler# hacks should have a supported, documented way to do it. Later, we can expose more features.
- Hooks for adding new foreign types. Used by bind.
- (possibly?) Hooks for adding new compiler literals? Used by numbers.
- Some standard way to determine the current source file (ideally this would be a library procedure which works the same way in compiled and evaluated code). Used for things like the s48-modules egg.
- Perhaps a way to define new compilation stages.
Rename the compiler modules to a single namespace. Right now it uses generic names like "optimizer" and "compiler", which might be re-used by user code, and thus might give a namespace collision, even (especially?) if we do not expose the import library for some of these parts. Perhaps simply prefix them "chicken-" or "chicken.". The latter would play nice with the r7rs module system; you could import it as (chicken compiler) there, for example.

Reworking the core modules ("units")

Right now the modules supplied by core are somewhat arbitrarily named, and too many unrelated things are grouped together. We should go through the system and look at what we have, then make logical names. Suggestion to appear later on this page, for further discussion. We should attempt to align it with the r7rs naming conventions, to make things easy for that egg, and for people new to CHICKEN but familiar with other r7rs implementations. This probably means "scheme" should be renamed and split up to "scheme.base", "scheme.load", etc. A possible generalisation (or "convenience hack") could be to define the "scheme" module to import all of the underlying submodules.

As I've posted to the mailing list, I think using hyphen makes more sense than using dot. --John Cowan

Replacing SRFI-14 with cset implementation from irregex?

This has been discussed ages ago. It might be more memory-friendly and performant. One problem with the current SRFI-14 module is that it assumes Latin1 encoding (and therefore can only handle 256 different characters), whereas most other CHICKEN components and eggs assume UTF-8.

Strong +1. --John Cowan

Refactoring the CHICKEN test suite to use a core library?

As we remove a lot of cruft from core which it doesn't need, it may be a good idea to add some things that we do need. Like the test egg: there is a lot of macro code duplication in core's test suite. It's probably better to ship a well-designed testing library with core, which core itself can also use. This would make it easier, if we decide to do this later, to format test output on Salmonella in a consistent manner for both core and eggs.

That could even be done for CHICKEN 4, since it wouldn't break anything. -- mario

Proposed libraries

Let's follow R7RS for these:

scheme.base
scheme.case-lambda
scheme.char
scheme.complex (when numbers is integrated)
scheme.cxr
scheme.eval
scheme.file
scheme.inexact
scheme.lazy
scheme.load
scheme.process-context
scheme.read
scheme.repl
scheme.time (need this? want this?)
scheme.write

What will we do with the SRFIs we implement? It would make sense to define the following, but it would be tedious to import all these:

srfi-2: and-let*
srfi-8: receive
srfi-31: rec
srfi-26: cut, cute
srfi-17: setter, getter-with-setter
srfi-10: define-reader-ctor
srfi-39: parameter objects

I'm planning to propose some of these (2, 8, 31, 26, 17) in a single R7RS-large library, probably called (scheme control) or (scheme control simple). --John Cowan
- Since this hasn't been standardised yet, and for improved compatibility and consistency with other Schemes, it's probably a good idea to define them as separate modules anyway. Note that this does not preclude re-exporting them elsewhere as well. --Peter Bex

Also, is it srfi-2 or srfi.2? The latter would match up with (srfi 2) usage which is reserved by R7RS for SRFIs.

If we get rid of dots, then it's just srfi-2 without special-casing it as the R7RS egg apparently does right now.

The list below is just a proposal, can be changed at any time. We should also keep an eye on R7RS WG2, which may define a few things CHICKEN currently defines already.

chicken.modules: module, import, export, reexport, define-interface, module-environment, functor, use
chicken.types: :, the, assume, define-type, define-specialization, compiler-typecase
chicken.reader-extensions: set-read-syntax!, set-sharp-read-syntax!, set-parameterized-read-syntax!, copy-read-table, current-read-table (perhaps re-export define-reader-ctor?)
chicken.fx (or chicken.fixnum?): fx+, fx-, fx/, fx*, fx<, fx<=, fx=, fx>, fx>=, fxand, fxeven?, fxior, fxmax, fxmin, fxmod, fxneg, fxnot, fxodd?, fxshl, fxshr, fxxor, fixnum-bits(?), fixnum-precision, most-positive-fixnum, most-negative-fixnum, fixnum-bits, fixnum-precision, fixnum?
chicken.fp (or chicken.flonum?): fp+, fp-, fp/, fp*, fp<, fp<=, fp=, fp>, fp>=, fpfloor, fpceiling, fptruncate, fpround, fpsin, fpcos, fptan, fpasin, fpacos, fpatan, fpatan2 (?), fplog, fpexp, fpexpt, fpsqrt, fpabs, fpinteger?, maximum-flonum, minimum-flonum, flonum-radix, flonum-epsilon, flonum-precision, flonum-decimal-precision, flonum-maximum-exponent, flonum-minimum-exponent, flonum-maximum-decimal-exponent, flonum-minimum-decimal-exponent, flonum?
chicken.syntax: er-macro-transformer, ir-macro-transformer, gensym(?), expand (is this useful at all?), get-line-number, strip-syntax.
chicken.bitwise: the subset of srfi-60 we support: bit-set?, bitwise-and, bitwise-not, bitwise-ior, bitwise-xor. Possibly complete it with the remaining operations, and call it just "srfi-60"?
chicken.ports: The current stuff in ports, except for the string ports in scheme.base (also, see below). Perhaps get rid of port-fold, copy-port, port, map, port-for-each?
chicken.exceptions (or srfi-12? Would make more sense, but what about our extensions? Put those in chicken.srfi-12?): All the exception handling stuff.
chicken.load: If we want to keep them, load-noisily, load-relative, load-library
chicken.format: [fs]?printf, format (do we need this?), pp, pretty-print, pretty-print-width

If you put use in a module, how do you get access to that module? I favor the R7RS solution, in which import does what Chicken use does, and is special-cased in terms of the module system so that it is always available. --John Cowan
- Right now, I think the module import/export forms are always available inside a module form. This is no different from special-casing it, I think (unless I'm misunderstanding something). --Peter Bex
There will be R7RS (scheme fixnum) and (scheme flonum) modules. I'm currently proposing to base the fixnums on R6RS and the flonums on math.h (not the egg of that name, but the whole C interface). --John Cowan
- That sounds like they'll be somewhat different from the list of identifiers we have. And it will take a while before it's finalized I guess, so it's safer to define our own and later add the r7rs versions if we deem it acceptable. --Peter Bex

Proposed removal from core

The list below is just one hacker's idea of what could go. Please add more.

SRFIs

SRFI-1, SRFI-13, SRFI-14, SRFI-18 might be removed. SRFI-69 will be removed, as discussed in CR #1142.

As pointed out several times by John Cowan, SRFI-15 (fluid-let) is unsafe in the presence of threads, and any use is most likely broken and should be replaced with R7RS/SRFI-39 parameters. Currently, core uses it in a few places, in a possibly dangerous way.

Most importantly, there is no reason it has to be in core, because it uses only basic primitives. I think it's best to delegate it to an egg.

queue datatype (data-structures), binary-search (data-structures), mmapped files (posix), object-evict (lolevel)

Proposal already accepted in CR #1142.

I'm proposing a queue library for R7RS-large. --John Cowan
- It would be great if it could be inspired by CHICKEN's, but that's not strictly necessary, as there is plenty of room for multiple queue eggs --Peter Bex

combinators

Some of the combinators from data-structures are very nice, but there only a handful of them are actually useful. There is no technical reason to keep them in core, they might fit better in an egg.

I'm proposing a similar library for R7RS-large. --John Cowan
- Maybe we can rip it out of core and wait for R7RS before implementing the egg. --Peter Bex

Various ill-conceived POSIX things

These things I don't like, but doesn't mean it *has* to go. It may always be put in an egg of course.

file-select (but see the section about refactoring the scheduler!)
file-control (no need to be in core)
file-mkstemp (too tricky to use properly? maybe a different API)
file-read and file-write (too low-level)
file-stat (might be changed return a record type?)
set-file-position! (see the section on I/O refactoring)
All the time stuff. It's too broken/difficult to use, and might be better off in an egg. Core uses some of it, so we may need to reconsider and just improve the API.
terminal-name, terminal-port?, terminal-size (but chicken-status uses it!)
The process-stuff. There are too many procedures which is confusing. Boil it down to just one or two essential ones. Possibly make a "fork&exec" implementation, which maps better to the Windows model, and still works fine on UNIX.

Better API for continuations

Nobody seems to use the "better API for continuations" by Feeley: continuation-graft, continuation-capture, continuation-return, continuation?

If it doesn't benefit anyone (core doesn't use it, only two eggs do: shift-reset and continuations), it can be taken out. It might be put into an egg.

+1 for an egg. I'm going to propose this for R7RS-large. --John Cowan

Reworking the way libraries are loaded

Right now there are just too many confusing things, like require, require-extension, use, import, load, load-library, require-library.

Import (with the function of use) should be the main API. Load is necessary because it can load things whose names are determined at run time. It should be able to load either source or binaries. Include also belongs here. --John Cowan

Units and modules are confusing also. This could just be a documentation issue.

Units should IMO be deprecated, with a compiler switch to turn off deprecation when compiling Chicken itself. --John Cowan
- I disagree: there's no reason why core should be "special" in any way. We could de-emphasize their importance in the manual, instead. --Peter Bex

Make the library load path a search path

This keeps cropping up on IRC: people expect to be able to load libraries from their eggs using a search path containing multiple entries. This would allow you to (use ...) a module from your application without installing it as an egg.

This is rather tricky: what happens when you compile it and install the whole program into some other location? Also, changing the way it's implemented is nontrivial, as it has been attempted before (see #736).

Refactoring the scheduler

One missing ability in the scheduler is for threads to block on more than one object. This would allow us to generalise file-select to ports.

Refactoring the I/O (ports) system

Currently, ports are somewhat ill-defined: they're a hand-coded record type with a bunch of slots, with comments indicating which slot is used for what. It would be cleaner and easier to understand the code if this was changed to a "proper" record type.

The current-*-port identifiers should be rewritten to be proper parameters instead of fake ones which are rebound through fluid-let.

Recently I discovered that set-file-position! does not work on string ports. Port position should be part of the official interface, so that this is extensible, and if a port implements it, it can be rewound. This makes sense at least for file-backed ports and string ports.

Well, not all file-backed ports are seekable. --John Cowan
- That's okay; they can throw a "not implemented" exception. --Peter Bex

This is also a good opportunity to look at why I/O is so slow.

One small improvement I'd like to make is to change write-string to accept an offset into the string from which to write. This would mean writing substrings does not have the overhead of first having to copy the substring to a new string and then writing it. I ran into this once and I thought it was a shame, because it's such a trivial (but incompatible) modification.

Integrating the full numeric tower

Obvious, but a lot of work which will probably result in a long tail of fallout (issues, bugs, missing support).

Important TODOs (most of which can be done inside the numbers egg):

Get rid of the last few malloc()/realloc() calls (for this I must (re-)study the division algorithm and hack it up further. Or reimplement it from scratch, if necessary: the current code is massively hairy)
Figure out an acceptable C API. Right now a few things are in Scheme that might be better implemented in C. Or at least should be callable from C somehow (notably parsing numbers).
(related to the above): Rename the C API functions to be more "CHICKENy", and get rid of all the strange S48/MIT-style C macros. This also involves using C_word at the right places, instead of "int" or "long".
Update the FFI and other subsystems to know about the new number types, so that e.g. full 64-bit integers are mapped to bignums and vice-versa (and exception thrown if it doesn't fit the given C type).
Add a new number-type ("smallnums?") that corresponds to the current "generic" type. Possibly add a new integer type that allows only fixnum/bignums.

Once we get around to implementing this, the main task is going through the entire C API and converting all the inline and non-allocating operations to CPS, and/or make them accept an allocation pointer.

String encoding

This at least needs some additional thought. Do we want to make UTF-8 the "official" encoding? If so, ideally, all string operations should reject invalidly encoded byte sequences (should we still allow NUL bytes to be represented?). What to do with the Unicode case folding lookup tables, string-ref?

Go full Unicode. If Chibi can do it, so can we. R7RS is factored to push the big Unicode tables into (scheme char). However, IMO the NUL character is completely worthless as a character: it has no semantics worth mentioning. We can forbid it in strings, as R7RS-small allows. --John Cowan
- Seems sensible. --Peter Bex

If we go full Unicode, the SRFI-4/blob types might need some attention, because strings can no longer be (ab)used as byte vectors.

Why are there both u8vectors and blobs? IMO they should be the same thing, and should be R7RS bytevectors. I'm working on a R7RS-large numeric vector library that allows either SRFI 4 style (separate data types for different kinds) or the style used in later SRFIs and R6RS (everything is just a view on top of bytevectors). --John Cowan
- u8vectors are less "core" than blobs (which is a consequence of the low-level representation). In fact, we might be able to take srfi-4 out of core. --Peter Bex

Rewrite chicken-install and make setup-files declarative

This will make it easier to make eggs statically compilable, cross-compile them and perhaps integrate them into other build systems. It will also mean that setup-files won't be running arbitrary Scheme code as root, which means it's more trustworthy.

A possible approach would be to create a module that exports only the list of things we want to support in setup-files. Then they will begin with (module () (use setup-files) and can still be executed, but only the whitelisted operations will be permitted. --John Cowan

Another thing we need is to make files installed by eggs more explicitly registered: currently "chicken-uninstall" will simply remove all libraries that have the given name as a prefix (see #1093).

Make set!'ing of unbound variables an error

R7RS recommends making this an error for modules but allowing it in the REPL.

We already check for renaming already bound identifiers, maybe that's not so hard after all. I will investigate this --Christian Kellermann

Determine how to make CHICKEN 4 eggs live alongside CHICKEN 5 eggs

Currently, "THE SYSTEM" does not have any special considerations for the major CHICKEN release used. This could be considered an oversight. To make it possible to continue using CHICKEN 4 eggs while CHICKEN 5 is being developed and matured, there needs to be some sort of way to do this.

Currently, we have the master list of available eggs, which lives in the svn repo. THE SYSTEM is extremely simple and doesn't really care much about how eggs are supplied, so we could just fire up a second instance of henrietta-cache which fetches from a different master list containing the CHICKEN 5 eggs. However, what can we do to make life easier for egg maintainers?

The official CHICKEN egg repo (SVN) already has taken care of this due to the /release/N namespacing. The thing that needs to be changed is the location of the henrietta CGI, to include a version number, or we could add an extra URL parameter and teach it about the versions.

For user repos, a simple way is to simply start a second repository and call it a day. However, this will probably result in awkward names. Making a new branch results in the same problem: the master branch would correspond to an outdated release!

The simplest approach: just carry on

Just continuing in the old repository is possible, if no new releases need to be tagged for the old CHICKEN release. This mostly precludes emergency bugfix releases, but these could be continued on a different branch (release-info only takes into account tarballs which get generated from a tag name, after all!).

To prevent version tag clashes, the egg's major version should be bumped for CHICKEN 5. Let's take for example an egg which has released 1.0, 1.2 and 1.3 for CHICKEN 4. If we bump the major version, we can release 2.0, 2.1, etc for CHICKEN 5. If an important bugfix needs to be made for the CHICKEN 4 version, we can continue with 1.4. If we don't bump the major version, the egg would be forced to use micro version numbers for those, like 1.3.1. Both approaches are fine, depending on how much effort is expected to be put into the "old" branch.

The old release-info file will be untouched and continue to be used by the CHICKEN 4 version of Henrietta-cache. For CHICKEN 5, a new file is made (ie myegg.chicken-5.release-info) which starts out empty, and as new releases are made will continue with the number where the CHICKEN 4 branch left off.

Rework each egg's release namespace

Another, possibly cleaner, approach is the following:

When an egg is ported to CHICKEN 5, rename or copy all existing tags, prefixing them like chicken-4/1.2, for example.
Make a chicken-4 branch from master and update the release-info file's location in the master egg list for CHICKEN 4.
Clear the release-info file in master, and submit its location for inclusion in the master egg list for CHICKEN 5.

This way, new eggs and old eggs will always have the master branch point to the active version. It does mean a little bit more work on every major release.

To avoid having to clear the release-info file every time, we could also extend it to include a major release version number (and if it's missing, assume "4"?). This means the release-info file would list both CHICKEN 4 and CHICKEN 5 (and later CHICKEN 6) releases in the same file. This might make maintenance a little easier, but requires a small change in henrietta-cache.

IMO the brains should be in the henrietta web API. --John Cowan
- I don't think that's necessary. In any case, there must be some way for the egg authors to indicate for which CHICKEN version the egg is. --Peter Bex