source: project/chicken/branches/prerelease/manual/Data representation @ 15101

Last change on this file since 15101 was 15101, checked in by felix winkelmann, 10 years ago

merged trunk changes from 14491:15100 into prerelease branch

File size: 5.3 KB
Line 
1[[tags: manual]]
2
3== Data representation
4
5''Note: In all cases below, bits are numbered starting at 1 and beginning with the lowest-order bit.''
6
7There exist two different kinds of data objects in the CHICKEN system:
8immediate and non-immediate objects.
9
10=== Immediate objects
11
12Immediate objects are represented by a single machine word, which is usually of 32 bits length, or 64 bits
13on 64-bit architectures.   The immediate objects
14come in four different flavors:
15
16'''fixnums''', that is, small exact integers, where bit 1 is
17set to 1. This gives fixnums a range of 31 bits for the actual
18numeric value (63 bits on 64-bit architectures).
19
20'''characters''', where bits 1-4 are equal to {{C_CHARACTER_BITS}}. The
21Unicode code point of the character is encoded in bits 9 to 32.
22
23'''booleans''', where bits 1-4 are equal to {{C_BOOLEAN_BITS}}. Bit 5
24is one for #t and zero for #f.
25
26'''other values''': the empty list, the value of unbound identifiers,
27the undefined value (void), and end-of-file.  Bits 1-4 are equal to {{C_SPECIAL_BITS}}; bits 5 to 8 contain an identifying
28number for this type of object.  The following constants are
29defined: {{C_SCHEME_END_OF_LIST C_SCHEME_UNDEFINED C_SCHEME_UNBOUND
30C_SCHEME_END_OF_FILE}}
31
32Collectively, bits 1 and 2 are known as the ''immediate mark bits''.  When bit 1 is set, the object is a fixnum, as described above, and bit 2 is part of its value.  When bit 1 is clear but bit 2 is set, it is an immediate object other than a fixnum.  If neither bit 1 nor bit 2 is set, the object is non-immediate, as described below.
33
34=== Non-immediate objects
35
36Non-immediate objects are blocks of data represented by a pointer into
37the heap.  The pointer's immediate mark bits (bits 1 and 2) must be zero to indicate the object is non-immediate;
38this guarantees the data block is aligned on a 4-byte boundary, at minimum.  Alignment of data words
39is required on modern architectures anyway, so we get the ability to distinguish between immediate and non-immediate objects for free.
40
41The first word of the data block contains a header, which gives
42information about the type of the object. The header has the size of a
43machine word, usually 32 bits (64 bits on 64 bit architectures).
44
45Bits 1 to 24 contain the length of the data object, which is either
46the number of bytes in a string (or byte-vector) or the the number
47of elements for a vector or for a structure type.
48
49Bits 25 to 28 contain the type code of the object.
50
51Bits 29 to 32 contain miscellaneous flags used for garbage
52collection or internal data type dispatching.
53These flags are:
54
55; C_GC_FORWARDING_BIT : Flag used for forwarding garbage collected object pointers.
56
57; C_BYTEBLOCK_BIT : Flag that specifies whether this data object contains raw bytes (a string or byte-vector) or pointers to other data objects.
58
59; C_SPECIALBLOCK_BIT : Flag that specifies whether this object contains a ''special'' non-object pointer value in its first slot. An example for this kind of objects are closures, which are a vector-type object with the code-pointer as the first item.
60
61; C_8ALIGN_BIT : Flag that specifies whether the data area of this block should be aligned on an 8-byte boundary (floating-point numbers, for example).
62
63The actual data follows immediately after the header. Note that
64block-addresses are always aligned to the native machine-word
65boundary. Scheme data objects map to blocks in the following manner:
66
67'''pairs''': vector-like object (type bits {{C_PAIR_TYPE}}),
68where the car and the cdr are contained in the first and second slots,
69respectively.
70
71'''vectors''': vector object (type bits {{C_VECTOR_TYPE}}).
72
73'''strings''': byte-vector object (type bits {{C_STRING_TYPE}}).
74
75'''procedures''': special vector object (type bits
76{{C_CLOSURE_TYPE}}). The first slot contains a pointer to a
77compiled C function. Any extra slots contain the free variables (since
78a flat closure representation is used).
79
80'''flonums''': a byte-vector object (type bits
81{{C_FLONUM_BITS}}). Slots one and two (or a single slot on
8264 bit architectures) contain a 64-bit floating-point number, in the
83representation used by the host systems C compiler.
84
85'''symbols''': a vector object (type bits {{C_SYMBOL_TYPE}}). Slots
86one and two contain the toplevel variable value and the print-name
87(a string) of the symbol, respectively.
88
89'''ports''': a special vector object (type bits
90{{C_PORT_TYPE}}). The first slot contains a pointer to a file-
91stream, if this is a file-pointer, or NULL if not. The other slots
92contain housekeeping data used for this port.
93
94'''structures''': a vector object (type bits
95{{C_STRUCTURE_TYPE}}). The first slot contains a symbol that
96specifies the kind of structure this record is an instance of. The other
97slots contain the actual record items.
98
99'''pointers''': a special vector object (type bits
100{{C_POINTER_TYPE}}). The single slot contains a machine pointer.
101
102'''tagged pointers''': similar to a pointer (type bits
103{{C_TAGGED_POINTER_TYPE}}), but the object contains an additional
104slot with a tag (an arbitrary data object) that identifies the type
105of the pointer.
106
107Data objects may be allocated outside of the garbage collected heap, as
108long as their layout follows the above mentioned scheme. But care has to
109be taken not to mutate these objects with heap-data (i.e. non-immediate
110objects), because this will confuse the garbage collector.
111
112For more information see the header file {{chicken.h}}.
113
114---
115Previous: [[Extensions]]
116
117Next: [[Bugs and limitations]]
Note: See TracBrowser for help on using the repository browser.