1 | [[toc:]] |
---|
2 | [[tags: manual]] |
---|
3 | |
---|
4 | == Data representation |
---|
5 | |
---|
6 | There exist two different kinds of data objects in the CHICKEN system: |
---|
7 | immediate and non-immediate objects. |
---|
8 | |
---|
9 | === Immediate objects |
---|
10 | |
---|
11 | Immediate objects are represented by a single machine word, 32 or 64 bits depending on the architecture. They come in four different flavors: |
---|
12 | |
---|
13 | '''fixnums''', that is, small exact integers, where the lowest order bit is |
---|
14 | set to 1. This gives fixnums a range of 31 bits for the actual |
---|
15 | numeric value (63 bits on 64-bit architectures). |
---|
16 | |
---|
17 | '''characters''', where the four lowest-order bits are equal to |
---|
18 | {{C_CHARACTER_BITS}}, currently 1010. The Unicode code point |
---|
19 | of the character is encoded in the next 24 bits. |
---|
20 | |
---|
21 | '''booleans''', where the four lowest-order bits are equal to {{C_BOOLEAN_BITS}}, |
---|
22 | currently 0110. The next bit is one for #t and zero for #f. |
---|
23 | |
---|
24 | '''other values''': the empty list, the value of unbound identifiers, |
---|
25 | the undefined value (void), and end-of-file. The four lowest-order bits are equal to |
---|
26 | {{C_SPECIAL_BITS}}, currently 1110. The next four bits contain an identifying |
---|
27 | number for this type of object, one of: |
---|
28 | {{C_SCHEME_END_OF_LIST}}, currently 0000; |
---|
29 | {{C_SCHEME_UNDEFINED}}, currently 0001; |
---|
30 | {{C_SCHEME_UNBOUND}}, currently 0010; or |
---|
31 | {{C_SCHEME_END_OF_FILE}}, currently 0011. |
---|
32 | |
---|
33 | === Non-immediate objects |
---|
34 | |
---|
35 | Collectively, the two lowest-order bits are known as the ''immediate mark bits''. When the lowest bit is set, the object is a fixnum, as described above, and the next bit is part of its value. When the lowest bit is clear but the next bit is set, it is an immediate object other than a fixnum. If neither bit is set, the object is non-immediate, as described below. |
---|
36 | |
---|
37 | Non-immediate objects are blocks of data represented by a pointer into |
---|
38 | the heap. The pointer's immediate mark bits must be zero to indicate the object is non-immediate; |
---|
39 | this guarantees the data block is aligned on a 4-byte boundary, at minimum. Alignment of data words |
---|
40 | is required on modern architectures anyway, so we get the ability to distinguish between immediate and non-immediate objects for free. |
---|
41 | |
---|
42 | The first word of the data block contains a header, which gives |
---|
43 | information about the type of the object. The header is a |
---|
44 | single machine word. |
---|
45 | |
---|
46 | The 24 lowest-order bits contain the length of the data object, which is either |
---|
47 | the number of bytes in a string or byte-vector, or the the number |
---|
48 | of elements for a vector or record type. |
---|
49 | |
---|
50 | The remaining bits are placed in the high-order end of the header. |
---|
51 | The four highest-order bits are used for garbage |
---|
52 | collection or internal data type dispatching. |
---|
53 | |
---|
54 | ; C_GC_FORWARDING_BIT : Flag used for forwarding garbage collected object pointers. |
---|
55 | |
---|
56 | ; C_BYTEBLOCK_BIT : Flag that specifies whether this data object contains raw bytes (a string or blob) or pointers to other data objects. |
---|
57 | |
---|
58 | ; C_SPECIALBLOCK_BIT : Flag that specifies whether this object contains a ''special'' non-object pointer value in its first slot. An example for this kind of objects are closures, which are a vector-type object with the code-pointer as the first item. |
---|
59 | |
---|
60 | ; C_8ALIGN_BIT : Flag that specifies whether the data area of this block should be aligned on an 8-byte boundary (floating-point numbers, for example). |
---|
61 | |
---|
62 | After these four bits comes a 4-bit type code representing one of the following types: |
---|
63 | |
---|
64 | '''vectors''': vector objects with type bits {{C_VECTOR_TYPE}}, currently 0000. |
---|
65 | |
---|
66 | '''symbols''': vector objects with type bits {{C_SYMBOL_TYPE}}, currently 0001. The three slots |
---|
67 | contain the toplevel variable value, the print-name (a string), and the property list |
---|
68 | of the symbol. |
---|
69 | |
---|
70 | '''strings''': byte-vector objects with type bits {{C_STRING_TYPE}}, currently 0010. |
---|
71 | |
---|
72 | '''pairs''': vector-like object with type bits {{C_PAIR_TYPE}}, currently 0011). |
---|
73 | The car and the cdr are contained in the first and second slots, |
---|
74 | respectively. |
---|
75 | |
---|
76 | '''closures''': special vector objects with type bits |
---|
77 | {{C_CLOSURE_TYPE}}, currently 0100. The first slot contains a pointer to a |
---|
78 | compiled C function. Any extra slots contain the free variables (since |
---|
79 | a flat closure representation is used). |
---|
80 | |
---|
81 | '''flonums''': byte-vector objects with type bits |
---|
82 | {{C_FLONUM_BITS}}, currently 0101. Slots one and two (or a single slot on |
---|
83 | 64 bit architectures) contain a 64-bit floating-point number, in the |
---|
84 | representation used by the host systems C compiler. |
---|
85 | |
---|
86 | '''ports''': special vector objects with type bits |
---|
87 | {{C_PORT_TYPE}}, currently 0111. The first slot contains a pointer to a file- |
---|
88 | stream, if this is a file-pointer, or NULL if not. The other slots |
---|
89 | contain housekeeping data used for this port. |
---|
90 | |
---|
91 | '''structures''': vector objects with type bits |
---|
92 | {{C_STRUCTURE_TYPE}}, currently 1000. The first slot contains a symbol that |
---|
93 | specifies the kind of structure this record is an instance of. The other |
---|
94 | slots contain the actual record items. |
---|
95 | |
---|
96 | '''pointers''': special vector objects with type bits |
---|
97 | {{C_POINTER_TYPE}}, currently 1001. The single slot contains a machine pointer. |
---|
98 | |
---|
99 | '''locatives''': special vector objects with type bits |
---|
100 | {{C_LOCATIVE_TYPE}}, currently 1010. A locative object holds 4 slots: |
---|
101 | a raw pointer to the location inside the object referred to by the locative, |
---|
102 | the offset in bytes from the start of the object referred to, the type of |
---|
103 | the location (whether it refers to an unboxed numeric value or a normal |
---|
104 | object slot that holds a pointer to Scheme data) and a flag indicating |
---|
105 | whether this locative is "weak". If the locative is non-weak, slot #4 holds |
---|
106 | a pointer to the object referred to. |
---|
107 | |
---|
108 | '''pointers''': special vector objects with type bits |
---|
109 | {{C_POINTER_TYPE}}, currently 1001. The single slot contains a machine pointer. |
---|
110 | |
---|
111 | '''tagged pointers''': special vector objects with type bits |
---|
112 | {{C_TAGGED_POINTER_TYPE}}, currently 1011, Tagged pointers are similar to pointers, |
---|
113 | but the object contains an additional |
---|
114 | slot with a tag (an arbitrary data object) that identifies the type |
---|
115 | of the pointer. |
---|
116 | |
---|
117 | '''lambda infos''': byte-vector objects with type-bits {{C_LAMBDA_INFO_TYPE}}, currently 1101. |
---|
118 | |
---|
119 | '''buckets''': vector objects with type-bits {{C_BUCKET_TYPE}}, currently 1111. These are |
---|
120 | only used internally for the implementation of symbol tables. |
---|
121 | |
---|
122 | The actual data follows immediately after the header. Note that |
---|
123 | block addresses are always aligned to the native machine-word |
---|
124 | boundary. |
---|
125 | |
---|
126 | Data objects may be allocated outside of the garbage collected heap, as |
---|
127 | long as their layout follows the above mentioned scheme. But care has to |
---|
128 | be taken not to mutate these objects with heap-data (i.e. non-immediate |
---|
129 | objects), because this will confuse the garbage collector. |
---|
130 | |
---|
131 | For more information see the header file {{chicken.h}}. |
---|
132 | |
---|
133 | |
---|
134 | |
---|
135 | --- |
---|
136 | Previous: [[Cross development]] |
---|
137 | |
---|
138 | Next: [[Bugs and limitations]] |
---|