source: project/wiki/man/4/Data representation @ 23717

Last change on this file since 23717 was 23717, checked in by Ivan Raikov, 10 years ago

Chicken manual: merging changes from the core repository

File size: 6.4 KB
Line 
1[[toc:]]
2[[tags: manual]]
3
4== Data representation
5
6There exist two different kinds of data objects in the CHICKEN system:
7immediate and non-immediate objects.
8
9=== Immediate objects
10
11Immediate objects are represented by a single machine word, 32 or 64 bits depending on the architecture.  They come in four different flavors:
12
13'''fixnums''', that is, small exact integers, where the lowest order bit is
14set to 1. This gives fixnums a range of 31 bits for the actual
15numeric value (63 bits on 64-bit architectures).
16
17'''characters''', where the four lowest-order bits are equal to
18{{C_CHARACTER_BITS}}, currently 1010. The Unicode code point
19of the character is encoded in the next 24 bits.
20
21'''booleans''', where the four lowest-order bits are equal to {{C_BOOLEAN_BITS}},
22currently 0110. The next bit is one for #t and zero for #f.
23
24'''other values''': the empty list, the value of unbound identifiers,
25the undefined value (void), and end-of-file.  The four lowest-order bits are equal to
26{{C_SPECIAL_BITS}}, currently 1110.  The next four bits contain an identifying
27number for this type of object, one of:
28{{C_SCHEME_END_OF_LIST}}, currently 0000;
29{{C_SCHEME_UNDEFINED}}, currently 0001;
30{{C_SCHEME_UNBOUND}}, currently 0010; or
31{{C_SCHEME_END_OF_FILE}}, currently 0011.
32
33=== Non-immediate objects
34
35Collectively, the two lowest-order bits are known as the ''immediate mark bits''.  When the lowest bit is set, the object is a fixnum, as described above, and the next bit is part of its value.  When the lowest bit is clear but the next bit is set, it is an immediate object other than a fixnum.  If neither bit is set, the object is non-immediate, as described below.
36
37Non-immediate objects are blocks of data represented by a pointer into
38the heap.  The pointer's immediate mark bits must be zero to indicate the object is non-immediate;
39this guarantees the data block is aligned on a 4-byte boundary, at minimum.  Alignment of data words
40is required on modern architectures anyway, so we get the ability to distinguish between immediate and non-immediate objects for free.
41
42The first word of the data block contains a header, which gives
43information about the type of the object. The header is a
44single machine word.
45
46The 24 lowest-order bits contain the length of the data object, which is either
47the number of bytes in a string or byte-vector, or the the number
48of elements for a vector or record type.
49
50The remaining bits are placed in the high-order end of the header.
51The four highest-order bits are used for garbage
52collection or internal data type dispatching.
53
54; C_GC_FORWARDING_BIT : Flag used for forwarding garbage collected object pointers.
55
56; C_BYTEBLOCK_BIT : Flag that specifies whether this data object contains raw bytes (a string or byte-vector) or pointers to other data objects.
57
58; C_SPECIALBLOCK_BIT : Flag that specifies whether this object contains a ''special'' non-object pointer value in its first slot. An example for this kind of objects are closures, which are a vector-type object with the code-pointer as the first item.
59
60; C_8ALIGN_BIT : Flag that specifies whether the data area of this block should be aligned on an 8-byte boundary (floating-point numbers, for example).
61
62After these four bits comes a 4-bit type code representing one of the following types:
63
64'''vectors''': vector objects with type bits {{C_VECTOR_TYPE}}, currently 0000.
65
66'''symbols''': vector objects with type bits {{C_SYMBOL_TYPE}}, currently 0001. The three slots
67contain the toplevel variable value, the print-name (a string), and the property list
68of the symbol.
69
70'''strings''': byte-vector objects with type bits {{C_STRING_TYPE}}, currently 0010.
71
72'''pairs''': vector-like object with type bits {{C_PAIR_TYPE}}, currently 0011).
73The car and the cdr are contained in the first and second slots,
74respectively.
75
76'''closures''': special vector objects with type bits
77{{C_CLOSURE_TYPE}}, currently 0100. The first slot contains a pointer to a
78compiled C function. Any extra slots contain the free variables (since
79a flat closure representation is used).
80
81'''flonums''': byte-vector objects with type bits
82{{C_FLONUM_BITS}}, currently 0101. Slots one and two (or a single slot on
8364 bit architectures) contain a 64-bit floating-point number, in the
84representation used by the host systems C compiler.
85
86'''ports''': special vector objects with type bits
87{{C_PORT_TYPE}}, currently 0111. The first slot contains a pointer to a file-
88stream, if this is a file-pointer, or NULL if not. The other slots
89contain housekeeping data used for this port.
90
91'''structures''': vector objects with type bits
92{{C_STRUCTURE_TYPE}}, currently 1000. The first slot contains a symbol that
93specifies the kind of structure this record is an instance of. The other
94slots contain the actual record items.
95
96'''pointers''': special vector objects with type bits
97{{C_POINTER_TYPE}}, currently 1001. The single slot contains a machine pointer.
98
99'''locatives''': special vector objects with type bits
100{{C_LOCATIVE_TYPE}}, currently 1010.  A locative object holds 4 slots:
101a raw pointer to the location inside the object referred to by the locative,
102the offset in bytes from the start of the object referred to, the type of
103the location (whether it refers to an unboxed numeric value or a normal
104object slot that holds a pointer to Scheme data) and a flag indicating
105whether this locative is "weak". If the locative is non-weak, slot #4 holds
106a pointer to the object referred to.
107
108'''tagged pointers''': special vector objects with type bits
109{{C_TAGGED_POINTER_TYPE}}, currently 1011, Tagged pointers are similar to pointers,
110but the object contains an additional
111slot with a tag (an arbitrary data object) that identifies the type
112of the pointer.
113
114'''SWIG pointers''': special vector objects with type bits {{C_SWIG_POINTER_TYPE}}, currently
1151100.
116
117'''lambda infos''': byte-vector objects with type-bits {{C_LAMBDA_INFO_TYPE}}, currently 1101.
118
119'''buckets''': vector objects with type-bits {{C_BUCKET_TYPE}}, currently 1111.
120
121The actual data follows immediately after the header. Note that
122block addresses are always aligned to the native machine-word
123boundary.
124
125Data objects may be allocated outside of the garbage collected heap, as
126long as their layout follows the above mentioned scheme. But care has to
127be taken not to mutate these objects with heap-data (i.e. non-immediate
128objects), because this will confuse the garbage collector.
129
130For more information see the header file {{chicken.h}}.
131
132
133
134---
135Previous: [[Cross development]]
136
137Next: [[Bugs and limitations]]
Note: See TracBrowser for help on using the repository browser.