CHICKEN 5 roadmap: Improve egg system

This document is part of the CHICKEN 5 roadmap.

CHICKEN 5 roadmap: Improve egg system

There is currently some work-in-progress in the chicken-5-new-egg-install branch of the CHICKEN git repository. See this document for some very rough information about the current status.

Since we may break backwards compatibility with CHICKEN 5 we could use the opportunity to improve the current egg system. What follows is a list of possible improvements, including motivation and a potential approach to implementing them. You are welcome to extend and comment on it.

Re-introduce support for static linking

Motivation

The current system relies on the egg author to explicitly support static linking. This has proven to not work very well in practice and lead to static linking of eggs not being officially supported anymore. While we still have deployment as an alternative approach for distributing self-contained programs with egg dependencies, static linking remains a frequently sought-after feature. Also, some platforms such as iOS don't support dynamic linking at all. Hence it would be worthwhile to re-introduce official support for static linking of eggs.

Goal

Static linking should just work, especially for basic extensions which only consist of a single module. However, it should also be possible to make more complex extensions statically linkable in a straightforward fashion. In other words, the new system should rely as little as possible on the egg author to explicitly support static linking.

Approach

One way to achieve this is to make the egg build process declarative instead of having a procedural setup file which requires the egg author to include compilation and installation of static objects. For example, a declarative build description for a simple single-module extension like matchable could consist of just a single form like (extension matchable). The build system would then assume that the file matchable.scm defines a module named matchable, compile that into a static object, a shared object as well as an import library and finally install all of them. This build description would also provide ways to override these default assumptions (e.g. to pass special options to the compilation process).

I've drafted up some ideas of what such egg declarations might look like in these pastes: http://paste.call-cc.org/paste?id=b3ebab1e4b1f245115d746e06b894edf8f8a9dbc --Moritz Heidkamp

Consolidate the egg concept

Motivation

Currently the egg concept is rather ill-defined: Eggs have a canonical name (as declared in the central egg index, i.e. the egg-locations file) and version (as declared in the egg's release-info file) but may install various programs and extensions of arbitrary other names and versions (as passed to install-extension and install-program in the egg's setup file). Additional essential egg meta data (such as dependencies) are declared in the egg's meta file.

After installation into the local repository, an egg's canonical name and version are lost. This leads to hacks such as "chicken-uninstall" removing all libraries that have the given name as a prefix (see #1093). Another consequence is that when chicken-install resolves dependency version requirements it will check against the versions used in the setup file rather than the canonical version which may differ (often by accident).

Goal

There should be fewer places where an egg's meta data live, ideally without any redundancy among them so as to reduce accidental breakage. Also, eggs should be first-class citizens in the local egg repository.

Approach

We could make the release-info file the only place to declare a version number. This already works to some extent as it is used in case no version is passed explicitly in the setup file. However, that only works when the egg is installed via henrietta. When an egg is installed from a local directory it ends up with a version of unknown which might not be ideal.

Programs and extensions could not be installed with an individual name and version anymore but would be registered in the local egg repository under the egg's canonical name and version. Dependencies would then be resolved against these names and versions only. This registry would also maintain information about which files belong to an egg so that chicken-uninstall can cleanly remove it again.

A declarative build system would blend well with this approach.

Restrict capabilities of setup files

Motivation

Currently setup-files may run arbitrary Scheme code, potentially even as root. This is a trust issue at best and a potential security vulnerability at worst.

Goal

Restrict what an egg's setup file is allowed to execute to make chicken-install more trustworthy.

Approach

A possible approach would be to create a module that exports only the list of things we want to support in setup-files. Then they will begin with (module () (use setup-files) and can still be executed, but only the whitelisted operations will be permitted. --John Cowan

There's a catch which might obviate this whole point: Since compilation will evaluate the syntax phase, arbitrary code may be executed anyway, so restricting the setup API is not enough. The only way I can think of to address this in a comprehensive way is to build eggs in a sandbox / jail / chroot / container which clearly lies beyond the scope of chicken-install, though. --Moritz Heidkamp

Disallow version specification as numbers

We should stop accepting versions as numbers, as they may cause confusion in some cases. For example:

 Version 0.20 is read as 0.2, which is less than 0.19, according to the
 version comparators.  This can cause problems when chicken-install
 takes decisions based on version numbers.
 
 Although (> 0.20 0.19) => #t,  (version>=? 0.20 0.19) => #f.  However,
 (version>=? "0.20" "0.19") => #t.
 
 That's because versions are (read) by setup-api and tokenized using
 `.' as separators.  If versions are numbers, they are read as numbers
 then converted to strings, then parsed by the version API.  So, 0.20
 is read as 0.2, converted to "0.2" and tokenized as ("0" "2").  Then,
 converted back to numbers we have (0 2).  If we apply the same to
 0.19, we have (> 2 19) => #f.
 
 By using versions as strings, we have "0.20" read as a string,
 tokenized as ("0" "20") and converted back to numbers as (0 20).
 Thus, (> 20 19) => #t.

From https://github.com/ckeen/pastiche/commit/862c1d7008342230dcd4a4109376dd381f956207

More notes

Vasilij Schneidermann's notes on the build system and packaging