source: project/wiki/man/4/Data representation @ 27416

Last change on this file since 27416 was 27416, checked in by Mario Domenech Goulart, 9 years ago

Data representation (manual): remove duplicate entry for `pointer'.

File size: 6.4 KB
Line 
1[[toc:]]
2[[tags: manual]]
3
4== Data representation
5
6There exist two different kinds of data objects in the CHICKEN system:
7immediate and non-immediate objects.
8
9=== Immediate objects
10
11Immediate objects are represented by a single machine word, 32 or 64 bits depending on the architecture.  They come in four different flavors:
12
13'''fixnums''', that is, small exact integers, where the lowest order bit is
14set to 1. This gives fixnums a range of 31 bits for the actual
15numeric value (63 bits on 64-bit architectures).
16
17'''characters''', where the four lowest-order bits are equal to
18{{C_CHARACTER_BITS}}, currently 1010. The Unicode code point
19of the character is encoded in the next 24 bits.
20
21'''booleans''', where the four lowest-order bits are equal to {{C_BOOLEAN_BITS}},
22currently 0110. The next bit is one for #t and zero for #f.
23
24'''other values''': the empty list, the value of unbound identifiers,
25the undefined value (void), and end-of-file.  The four lowest-order bits are equal to
26{{C_SPECIAL_BITS}}, currently 1110.  The next four bits contain an identifying
27number for this type of object, one of:
28{{C_SCHEME_END_OF_LIST}}, currently 0000;
29{{C_SCHEME_UNDEFINED}}, currently 0001;
30{{C_SCHEME_UNBOUND}}, currently 0010; or
31{{C_SCHEME_END_OF_FILE}}, currently 0011.
32
33=== Non-immediate objects
34
35Collectively, the two lowest-order bits are known as the ''immediate mark bits''.  When the lowest bit is set, the object is a fixnum, as described above, and the next bit is part of its value.  When the lowest bit is clear but the next bit is set, it is an immediate object other than a fixnum.  If neither bit is set, the object is non-immediate, as described below.
36
37Non-immediate objects are blocks of data represented by a pointer into
38the heap.  The pointer's immediate mark bits must be zero to indicate the object is non-immediate;
39this guarantees the data block is aligned on a 4-byte boundary, at minimum.  Alignment of data words
40is required on modern architectures anyway, so we get the ability to distinguish between immediate and non-immediate objects for free.
41
42The first word of the data block contains a header, which gives
43information about the type of the object. The header is a
44single machine word.
45
46The 24 lowest-order bits contain the length of the data object, which is either
47the number of bytes in a string or byte-vector, or the number
48of elements for a vector or record type.
49
50The remaining bits are placed in the high-order end of the header.
51The four highest-order bits are used for garbage
52collection or internal data type dispatching.
53
54; C_GC_FORWARDING_BIT : Flag used for forwarding garbage collected object pointers.
55
56; C_BYTEBLOCK_BIT : Flag that specifies whether this data object contains raw bytes (a string or blob) or pointers to other data objects.
57
58; C_SPECIALBLOCK_BIT : Flag that specifies whether this object contains a ''special'' non-object pointer value in its first slot. An example for this kind of objects are closures, which are a vector-type object with the code-pointer as the first item.
59
60; C_8ALIGN_BIT : Flag that specifies whether the data area of this block should be aligned on an 8-byte boundary (floating-point numbers, for example).
61
62After these four bits comes a 4-bit type code representing one of the following types:
63
64'''vectors''': vector objects with type bits {{C_VECTOR_TYPE}}, currently 0000.
65
66'''symbols''': vector objects with type bits {{C_SYMBOL_TYPE}}, currently 0001. The three slots
67contain the toplevel variable value, the print-name (a string), and the property list
68of the symbol.
69
70'''strings''': byte-vector objects with type bits {{C_STRING_TYPE}}, currently 0010.
71
72'''pairs''': vector-like object with type bits {{C_PAIR_TYPE}}, currently 0011).
73The car and the cdr are contained in the first and second slots,
74respectively.
75
76'''closures''': special vector objects with type bits
77{{C_CLOSURE_TYPE}}, currently 0100. The first slot contains a pointer to a
78compiled C function. Any extra slots contain the free variables (since
79a flat closure representation is used).
80
81'''flonums''': byte-vector objects with type bits
82{{C_FLONUM_BITS}}, currently 0101. Slots one and two (or a single slot on
8364 bit architectures) contain a 64-bit floating-point number, in the
84representation used by the host systems C compiler.
85
86'''ports''': special vector objects with type bits
87{{C_PORT_TYPE}}, currently 0111. The first slot contains a pointer to a file-
88stream, if this is a file-pointer, or NULL if not. The other slots
89contain housekeeping data used for this port.
90
91'''structures''': vector objects with type bits
92{{C_STRUCTURE_TYPE}}, currently 1000. The first slot contains a symbol that
93specifies the kind of structure this record is an instance of. The other
94slots contain the actual record items.
95
96'''locatives''': special vector objects with type bits
97{{C_LOCATIVE_TYPE}}, currently 1010.  A locative object holds 4 slots:
98a raw pointer to the location inside the object referred to by the locative,
99the offset in bytes from the start of the object referred to, the type of
100the location (whether it refers to an unboxed numeric value or a normal
101object slot that holds a pointer to Scheme data) and a flag indicating
102whether this locative is "weak". If the locative is non-weak, slot #4 holds
103a pointer to the object referred to.
104
105'''pointers''': special vector objects with type bits
106{{C_POINTER_TYPE}}, currently 1001. The single slot contains a machine pointer.
107
108'''tagged pointers''': special vector objects with type bits
109{{C_TAGGED_POINTER_TYPE}}, currently 1011, Tagged pointers are similar to pointers,
110but the object contains an additional
111slot with a tag (an arbitrary data object) that identifies the type
112of the pointer.
113
114'''lambda infos''': byte-vector objects with type-bits {{C_LAMBDA_INFO_TYPE}}, currently 1101.
115
116'''buckets''': vector objects with type-bits {{C_BUCKET_TYPE}}, currently 1111. These are
117only used internally for the implementation of symbol tables.
118
119The actual data follows immediately after the header. Note that
120block addresses are always aligned to the native machine-word
121boundary.
122
123Data objects may be allocated outside of the garbage collected heap, as
124long as their layout follows the above mentioned scheme. But care has to
125be taken not to mutate these objects with heap-data (i.e. non-immediate
126objects), because this will confuse the garbage collector.
127
128For more information see the header file {{chicken.h}}.
129
130
131
132---
133Previous: [[Cross development]]
134
135Next: [[Bugs and limitations]]
Note: See TracBrowser for help on using the repository browser.