source: project/wiki/allocating-c-structures-under-control-of-the-chicken-gc @ 5796

Last change on this file since 5796 was 5796, checked in by wmfarr, 12 years ago

Changes applied for wmfarr (18.95.7.238) through svnwiki:

Fixed typos, added "Why would you want to do this" section.

File size: 6.6 KB
Line 
1== Allocating C Structures Under the Control of the Chicken GC
2
3[[toc:]]
4
5This tip describes how to allocate C structures such that they will be under the control of the Chicken garbage collector.
6
7Chicken has an unusual garbage collection scheme, roughly following the procedure described in [[http://home.pipeline.com/~hbaker1/CheneyMTA.html|Cheney on the MTA, Part II]].  Please keep in mind the manual sections [[Data representation]] and [[Interface to external functions and variables]] while you read this.
8
9The code in this section will be a mix of C and Scheme; recall that you can embed C code directly in your Scheme programs using {{(foreign-declare CODE-STRING)}}.
10
11=== Chicken GC Basics
12
13The Chicken GC expects all non-immediate objects to have a "tag" value in their first {{sizeof(C_header)}} bytes.  This tag stores three pieces of information:
14
15* The type of the object (i.e. {{C_STRING_TYPE}}, {{C_VECTOR_TYPE}}, etc).
16* Various GC tags (one which is relevant to us is {{C_BYTEVECTOR_BIT}}).
17* The size of the object (in bytes if it's a bytevector type, or in words if it's not).
18
19The manual section [[Data representation]] explains this.
20
21=== Allocating Atomic Data
22
23"Atomic" data are data which contain no pointers.  For example, suppose we want to have the following C structure definition for point-mass bodies in a three-dimensional gravitational simulation:
24
25<enscript highlight=c>
26typedef struct {
27  C_header tag;
28  double m;
29  double q[3];
30  double p[3];
31} body;
32</enscript>
33
34Note that a {{body}} structure contains no pointers.  We ''could'' use a declaration like
35
36<enscript highlight=scheme>
37(define-foreign-record body
38  (scheme-object tag)
39  (double m)
40  (double q 3)
41  (double p 3))
42</enscript>
43
44to represent this object within Chicken.  Unfortunately, if we do that, the procedures {{body-m}}, {{body-q}}, etc, will expect to have foreign pointer objects which contain a pointer to a {{body}} object.  But, this body object must be allocated outside the GC-managed heap, because the pointer in a foreign-pointer is not scanned during garbage collection.  To manage it, we must either set a finalizer for the pointer object which contains it (using {{set-finalizer!}}) or deallocate it by hand.  Neither of these is an attractive option for lots of data (maybe we're running a [[http://www.mpa-garching.mpg.de/galform/millennium/|Big Simulation]]).
45
46So, we'll have to do things by hand.  First, let's define the body tag.  Since bodies will be fixed-size bytevectors, we'll use
47
48<enscript highlight=c>
49static const C_header BODY_TAG = (sizeof(body) - sizeof(C_header)) | C_BYTEVECTOR_BIT | C_STRING_TYPE;
50</enscript>
51
52This stores the size of the data in a body struct (minus the tag size), sets the bit which tells the GC that we're dealing with a bytevector, and gives a body the (arbitrary) scheme type {{string}}. 
53
54Now we define a {{make-body}} Scheme procedure, which takes a mass, f64vector position, and f64vector momentum and produces a body:
55
56<enscript highlight=scheme>
57(define make-body
58  (foreign-primitive scheme-object ((double m)
59                                    (f64vector q)
60                                    (f64vector p))
61#<<END
62   body result;
63
64   result.tag = BODY_TAG;
65
66   result.m = m;
67   memcpy(result.q, q, 3*sizeof(double));
68   memcpy(result.p, p, 3*sizeof(double));
69
70   C_return (&result);
71END
72))
73</enscript>
74
75Note the return type of {{scheme-object}}, since a {{body}} is a scheme object.  Note also, that we allocate {{result}} on the stack, and "return" the address of {{result}} at the end of the function.  (The {{foreign-primitive}} actually never returns; it calls its continuation with the return value.)
76
77We can define a {{body?}} predicate.  This predicate is not exact---it cannot distinguish between bodies and strings of the same length---but there is no way to make it exact with Chicken's memory model as it stands now.
78
79<enscript highlight=c>
80static C_word body_p(C_word obj) {
81  if (C_immediatep(obj)) {
82    return C_SCHEME_FALSE;
83  } else if (C_block_header(obj) == BODY_TAG) {
84    return C_SCHEME_TRUE;
85  } else {
86    return C_SCHEME_FALSE;
87  }
88}
89</enscript>
90
91and
92
93<enscript highlight=scheme>
94(define body? (foreign-lambda scheme-object "body_p" scheme-object))
95</enscript>
96
97Defining accessors is similarly straightforward.
98
99=== Allocating non-atomic data
100
101"Non-atomic" data are data which contain only pointers.  Suppose we have a binary tree datastructure which can store bodies from the above example (contrived, I know, but it should illustrate the technique):
102
103<enscript highlight=c>
104struct btree_struct {
105  C_header tag;
106  body *b;
107  struct btree_struct *left;
108  struct btree_struct *right;
109};
110typedef struct btree_struct btree;
111</enscript>
112
113Now we define a tag which indicates that this is a vector-like object (in fact, it will be a Scheme vector), and its size:
114
115<enscript highlight=c>
116static const C_header BTREE_TAG = ((sizeof(btree) - sizeof(C_header)) / sizeof(C_word)) | C_VECTOR_TYPE;
117</enscript>
118
119Note that, for non-bytevector objects, the size part of the header is the size in {{C_word}}s, not bytes.  From here, things follow the examples in the last section.
120
121=== Allocating mixed data
122
123You can't.  Sorry.  Break your data up into one atomic struct which stores all the non-pointers, and then a "pointer" struct which stores a pointer to the atomic data, and all other pointers involved in your datastructure.
124
125=== Allocating large data
126
127This is also not a good idea---it's easy to overflow the stack (which is the nursery in the generational GC that Chicken uses).  It's non-trivial to allocate some data directly in the second generation.  If you absolutely must do this, then have a look at {{C_allocate_vector}} in the {{runtime.c}} file of the Chicken distribution.  Good luck.
128
129=== Why Would You Want to do This?
130
131==== Speed
132
133Chicken's allocation is '''fast'''.  Much faster than {{malloc}} and {{free}}---it just requires bumping the stack pointer.  So, if you're looking for speed (which you probably are if you're writing C code for Chicken) you're much better off using it than trying to {{malloc}} your datastructures and stuff them into Chicken's {{pointer}} type.
134
135==== Convenience
136
137If you {{malloc}} your datastructures, you'll have to either
138
139* Explicitly {{free}} them when you're done OR
140* Register a finalizer for them using {{set-finalizer!}}
141
142The first option is not attractive if you have complicated data with lots of pointers between structures (that's why GC's were invented, after all).  The second option is OK for a couple of objects, but Chicken doesn't play very well with millions of finalizers registered.  If you want to allocate a lot of objects, don't register finalizers for them.
Note: See TracBrowser for help on using the repository browser.