Allocating C Structures Under the Control of the CHICKEN GC

  1. Allocating C Structures Under the Control of the CHICKEN GC
    1. CHICKEN GC Basics
    2. Allocating Atomic Data
    3. Allocating non-atomic data
    4. Allocating mixed data
    5. Allocating large data
    6. Why Would You Want to do This?
      1. Speed
      2. Convenience

This tip describes how to allocate C structures such that they will be under the control of the CHICKEN garbage collector.

CHICKEN has an unusual garbage collection scheme, roughly following the procedure described in Cheney on the MTA, Part II. Please keep in mind the manual sections Data representation and Interface to external functions and variables while you read this.

The code in this section will be a mix of C and Scheme; recall that you can embed C code directly in your Scheme programs using (foreign-declare CODE-STRING).

CHICKEN GC Basics

The CHICKEN GC expects all non-immediate objects to have a "tag" value in their first sizeof(C_header) bytes. This tag stores three pieces of information:

The manual section Data representation explains this.

Allocating Atomic Data

"Atomic" data are data which contain no pointers. For example, suppose we want to have the following C structure definition for point-mass bodies in a three-dimensional gravitational simulation:

typedef struct {
  C_header tag;
  double m;
  double q[3];
  double p[3];
} body;

Note that a body structure contains no pointers. We could use a declaration like

(define-foreign-record body
  (scheme-object tag)
  (double m)
  (double q 3)
  (double p 3))

to represent this object within CHICKEN. Unfortunately, if we do that, the procedures body-m, body-q, etc, will expect to have foreign pointer objects which contain a pointer to a body object. But, this body object must be allocated outside the GC-managed heap, because the pointer in a foreign-pointer is not scanned during garbage collection. To manage it, we must either set a finalizer for the pointer object which contains it (using set-finalizer!) or deallocate it by hand. Neither of these is an attractive option for lots of data (maybe we're running a Big Simulation).

So, we'll have to do things by hand. First, let's define the body tag. Since bodies will be fixed-size bytevectors, we'll use

static const C_header BODY_TAG = (sizeof(body) - sizeof(C_header)) | C_BYTEVECTOR_BIT | C_STRING_TYPE;

This stores the size of the data in a body struct (minus the tag size), sets the bit which tells the GC that we're dealing with a bytevector, and gives a body the (arbitrary) scheme type string.

Now we define a make-body Scheme procedure, which takes a mass, f64vector position, and f64vector momentum and produces a body:

(define make-body 
  (foreign-primitive scheme-object ((double m)
                                    (f64vector q)
                                    (f64vector p))
#<<END
   body result;

   result.tag = BODY_TAG;

   result.m = m;
   memcpy(result.q, q, 3*sizeof(double));
   memcpy(result.p, p, 3*sizeof(double));

   C_return (&result);
END
))

Note the return type of scheme-object, since a body is a scheme object. Note also, that we allocate result on the stack, and "return" the address of result at the end of the function. (The foreign-primitive actually never returns; it calls its continuation with the return value.)

We can define a body? predicate. This predicate is not exact---it cannot distinguish between bodies and strings of the same length---but there is no way to make it exact with CHICKEN's memory model as it stands now.

static C_word body_p(C_word obj) {
  if (C_immediatep(obj)) {
    return C_SCHEME_FALSE;
  } else if (C_block_header(obj) == BODY_TAG) {
    return C_SCHEME_TRUE;
  } else {
    return C_SCHEME_FALSE;
  }
}

and

(define body? (foreign-lambda scheme-object "body_p" scheme-object))

Defining accessors is similarly straightforward.

Allocating non-atomic data

"Non-atomic" data are data which contain only pointers. Suppose we have a binary tree datastructure which can store bodies from the above example (contrived, I know, but it should illustrate the technique):

struct btree_struct {
  C_header tag;
  body *b;
  struct btree_struct *left;
  struct btree_struct *right;
};
typedef struct btree_struct btree;

Now we define a tag which indicates that this is a vector-like object (in fact, it will be a Scheme vector), and its size:

static const C_header BTREE_TAG = ((sizeof(btree) - sizeof(C_header)) / sizeof(C_word)) | C_VECTOR_TYPE;

Note that, for non-bytevector objects, the size part of the header is the size in C_words, not bytes. From here, things follow the examples in the last section.

Allocating mixed data

You can't. Sorry. Break your data up into one atomic struct which stores all the non-pointers, and then a "pointer" struct which stores a pointer to the atomic data, and all other pointers involved in your datastructure.

Allocating large data

This is also not a good idea---it's easy to overflow the stack (which is the nursery in the generational GC that CHICKEN uses). It's non-trivial to allocate some data directly in the second generation. If you absolutely must do this, then have a look at C_allocate_vector in the runtime.c file of the CHICKEN distribution. Good luck.

Why Would You Want to do This?

Speed

CHICKEN's allocation is fast. Much faster than malloc and free---it just requires bumping the stack pointer. So, if you're looking for speed (which you probably are if you're writing C code for CHICKEN) you're much better off using it than trying to malloc your datastructures and stuff them into CHICKEN's pointer type.

Convenience

If you malloc your datastructures, you'll have to either

The first option is not attractive if you have complicated data with lots of pointers between structures (that's why GC's were invented, after all). The second option is OK for a couple of objects, but CHICKEN doesn't play very well with millions of finalizers registered. If you want to allocate a lot of objects, don't register finalizers for them.