source: project/wiki/eggref/4/sdbm @ 34183

Last change on this file since 34183 was 34183, checked in by svnwiki, 15 months ago

Anonymous wiki edit for IP [64.134.25.154]: minor edit to show an "example" correctly as a block

File size: 9.7 KB
Line 
1[[tags: egg]]
2== sdbm
3
4'''sdbm''' is a clone of the SDBM database library.
5[[toc:]]
6=== Overview
7
8'''sdbm''' is a reimplementation of the public-domain SDBM database library,
9which is itself essentially a clone of NDBM.  SDBM provides a simple
10key-value store with a fixed limit on data length and no ACID semantics
11to speak of, providing no write locking, no atomicity, no transactions,
12and little guarantee that your file won't be corrupted in a crash.
13It also relies on sparse file support, which is not present on filesystems
14such as HFS+.
15
16Despite these shortcomings, it is a simple implementation without
17dependencies, written completely in Scheme.  And some issues
18with the original implementation have been remedied: byte order is
19configurable, and page and directory block size can be adjusted
20at runtime.  Therefore, '''sdbm''' might still be useful as a very
21simple key-value store for non-critical applications.
22
23=== Joint Database Technology
24
25Where external, binary, persistent, SDBM database files (tied to program hash tables) can really be made useful is in using the key/value pairs for random access indexing into a huge relational "text" flat file database composed of many flat files (with fixed-length records) exhibiting parent/child (1-to-many record) relationships. The key would be composed of a: single field, single partial field, or a compound key of multiple single and/or partial fields concatenated together (perhaps with a delimiter character between them such as a pipe "|"). The value in the key/value pair would be the location offset (in bytes) to seek to (i.e. position the file pointer) in a flat file at the start of a specific record wished to be random accessed for: READ, READ/WRITE, or APPEND access.  Multiple SDBM files can be setup as alternate indexes into each of the Flat File database text files, each SDBM file containing a different key (composed of a: single field, single partial field, or a compound key of multiple single and/or partial fields concatenated together). An alternate key with duplicates can be created in the SDBM files by making as part of the key, an incremented number perhaps in the range 1-9999. 
26        Key example:    LastName|IncNbr(perhaps in range 1-9999)
27                        "Williams|1" ... "Williams|5745". 
28
29When editing Flat File records, the changes are made "in place" overwriting existing data in the flat file record. Be careful to design your user-interface so that any changes in the Flat File data are also made in any corresponding data in your SDBM file key/value pairs so that the indexing is properly maintained. A DELETE flag indicator field can be employed to mark records in both the Flat Files and SDBM files (for later BATCH deletion Server-side during off hours) for an application program to recognize as a BYPASS indicator.
30
31In a multi-user environment, a manual record locking system could be designed to lock a specific record for editing by one user.  The username, flat file name, and record offset could be stored in an external SDBM database file (tied to a program hash table) at the time a user makes the request to edit a specific record. Once the record is released from EDIT (SAVE or CANCEL issued), then the lock is removed from the SDBM file.  Each time  a user makes a request to edit a record, the user-interface to the Flat File database would perform a Lookup to this Lock File to determine if the record in the Flat File was available for edit or already locked by another user. 
32
33If the user-interface was designed well, child records (in a 1-to-many, parent/child relationship) would not be directly editable, but only editable whenever the corresponding parent record was locked for edit.   
34
35This is a very stable/safe database system. The binary SDBM files can easily be rebuilt from the Flat File database records. This is more desirable, then let's say, a MS-Access database, where the text data and indexes are stored together in a binary file which can become corrupted making it sometimes difficult to rescue your important textual data.  In MS-Access, the database Data and Objects: back-end Tables/Indexes/Data, and front-end Reports/Forms/Macros/etc. are often mistakenly stored in one file (in binary format) - although MS-Access does provide for the means to separate the back-end and front-end into separate files allowing for a much more stable DB system.
36
37One advantage joint/tandem/dual technology Flat File/SDBM databases have over MS-Access (for example) is that they require no MDAC (Microsoft Data Access Components) be installed to each client. ODBC-enabled MS-Access databases without the use of the MS-Access front-end software can be designed to create a huge database (perhaps to 1 Terabyte/5 Billion rows in practicality - depends on whether it is a READ ONLY Data Warehouse or a READ/WRITE Database) where each MDB file is used as a: single table, group of tables, or partial table (common to all the MDB files, and where the data is logically kept segregated for ease of random access to 1, or perhaps 2, MDB files - each MDB file containing as many as 10 million rows).
38
39FLAT FILE/SDBM, or MDB, relational database systems can employ file naming convention to make it easy for a DB application user-interface to determine which file(s) to look in. Example:  A flat file named US_CENSUS_2010_TX_A.txt (or .mdb for MS-Access) would be one way to identify a file logically segregated to contain only data associated with Texas citizens whose last name began with the letter "A".  A business would need to determine what logical segregation of data made the most sense for their operational needs. Server-side batch EDIT operations and heavy reporting could be performed during off hours. For common data statistics, a statistics table could be maintained (Server-side during off hours) which answered most user questions which would be an aggregate of the data across the entire database system (as in: Stats for the entire U.S., and Stats for each individual State of the 50 States - from the example given above).
40
41=== Installation
42
43Use {{chicken-install}} as usual.  But some configuration can be done
44by defining certain features at compile-time.
45
46; Byte order: {{sdbm-little-endian}} or {{sdbm-big-endian}} set the read and write order of bytes in the file.  If no byte order is specified, host order is used, as in the original implementation.
47; Hash function: {{sdbm-hash-djb}} selects an alternate hash function by Dan Bernstein.  If no hash function is specified, the native SDBM hash function is used.
48
49To define a feature, set it in {{CSC_OPTIONS}} before calling {{chicken-install}}:
50
51 CSC_OPTIONS="-Dsdbm-hash-djb -Dsdbm-big-endian" chicken-install sdbm
52
53will configure '''sdbm''' to use the DJB hash and big-endian order.
54
55=== Basic interface
56
57<procedure>(open-database pathname #!key flags mode page-block-power dir-block-power) -> db</procedure>
58
59Opens existing SDBM database {{pathname}} or creates an empty database if
60{{pathname}} does not exist.  The database resides in two files:
61{{pathname.dir}} (directory file) and {{pathname.pag}} (page file).
62Returns an opaque database object.
63
64Optional keyword arguments are:
65
66; flags : flags passed to {{file-open}}, default: {{(+ open/rdwr open/creat)}}
67; mode : permissions passed to {{file-open}}, default: {{(+ perm/irwxu perm/irgrp perm/iroth)}}
68; page-block-power : bytes in each data page, as a power of 2; default: 12 (4096 bytes)
69; dir-block-power : bytes in each directory block, as a power of 2; default: 12 (4096 bytes)
70
71The data page size limits the length of a key/value pair, so you may
72need to increase it to correspond with your maximum pair size.
73An undersized page can lead to frequent hash bucket splits and a
74bloated file size with many holes.  An oversized page can incur
75disk performance overhead on read and write, since an entire page is
76read or written for every operation.  Values between 4096 and
7716384 bytes seem reasonable.
78
79Note: The SDBM format has no database header, so you must always
80specify the same {{page-block-power}} and {{dir-block-power}} for
81a given database.  The reference implementation uses {{page-block-power}}
82of 10 (1024 bytes) and {{dir-block-power}} of 12 (4096 bytes).
83
84<procedure>(close-database db)</procedure>
85
86Close database associated with {{db}}.
87
88<procedure>(fetch db key) -> val</procedure>
89
90Fetch {{key}} from SDBM database {{db}}, returning the associated value or
91{{#f}} if the key did not exist.  The returned value is a string.
92{{key}} is normally a string; if not, it is converted into a string.
93
94<procedure>(store! db key val #!optional (replace #t))</procedure>
95
96Store {{key}}, {{val}} pair into SDBM database {{db}}.  {{val}} must
97be a string; {{key}} is converted into a string if not already.
98
99If the key exists, and optional argument {{replace}} is {{#t}} (the default)
100then the pair will be replaced.  If replace is {{#f}} instead,
101an error is returned.
102
103<procedure>(delete! db key)</procedure>
104
105Delete {{key}} from SDBM database {{db}}.  If {{key}} does not
106exist, an error is raised.
107
108=== Enumeration
109
110<procedure>(pair-iterator db) -> iter</procedure>
111
112Return a new pair iterator object that can be used to
113iterate over pairs in the SDBM database {{db}}.  Pass
114this iterator to {{next-pair}} repeatedly.
115
116<procedure>(next-pair iter) -> (key . val)</procedure>
117
118Return the next pair that {{iter}}, a {{pair-iterator}}, sees
119in the database.  If there are no more pairs, returns {{#f}}.
120Otherwise, it returns a (key . val) pair, where both values
121are strings.
122
123<procedure>(pair-fold db kons knil) -> kvs</procedure>
124
125Perform a fold over all pairs in SDBM database {{db}}.  {{knil}}
126is the initial value.  {{kons}} is a procedure of three arguments:
127{{(key val kvs)}}.  The return value of {{kons}} is passed to the
128next execution of {{kons}} in {{kvs}}.
129
130=== Author
131
132Jim Ursetto
133
134=== Version history
135
136; 0.1.0 : Initial release
137
138=== License
139
140BSD
Note: See TracBrowser for help on using the repository browser.