Changeset 34196 in project


Ignore:
Timestamp:
06/20/17 20:05:23 (3 months ago)
Author:
svnwiki
Message:

Anonymous wiki edit for IP [70.123.118.182]: added perl code example

File:
1 edited

Legend:

Unmodified
Added
Removed
  • wiki/eggref/4/sdbm

    r34187 r34196  
    2121simple key-value store for non-critical applications.
    2222
    23 === Joint Database Technology - using SDBM as indexing into Flat File Databases
    24 
    25 Where external, binary, persistent, SDBM database files (tied to program hash tables - such as in a PERL application program) can really be made useful is in using the key/value pairs for random access indexing into a huge relational "text" flat file database composed of many flat files (with fixed-length records) exhibiting parent/child (1-to-many record) relationships. The key would be composed of a: single field, single partial field, or a compound key of multiple single and/or partial fields concatenated together (perhaps with a delimiter character between them such as a pipe "|"). The value in the key/value pair would be the location offset (in bytes) to seek to (i.e. position the file pointer) in a flat file at the start of a specific record wished to be random accessed for: READ, READ/WRITE, or APPEND access.  Multiple SDBM files can be setup as alternate indexes into each of the Flat File database text files, each SDBM file containing a different key (composed of a: single field, single partial field, or a compound key of multiple single and/or partial fields concatenated together). An alternate key with duplicates can be created in the SDBM files by making as part of the key, an incremented number perhaps in the range 1-9999. 
     23=== Joint Database Technology - SDBM and Flat File Databases working in tandem
     24
     25Where external, binary, persistent, SDBM (see SDBM mentioned in SUCCESSORS section below)
     26database files (tied to program hash tables - such as in PERL application programs) can really be made useful is in using the key/value pairs for random access indexing into a huge relational "text" flat file database composed of many flat files (with fixed-length records) exhibiting parent/child (1-to-many record) relationships. The key would be composed of a: single field, single partial field, or a compound key of multiple single and/or partial fields concatenated together (perhaps with a delimiter character between them such as a pipe "|"). The value in the key/value pair would be the location offset (in bytes) to seek to (i.e. position the file pointer) in a flat file at the start of a specific record wished to be random accessed for: READ, READ/WRITE, or APPEND access.  Multiple SDBM files can be setup as alternate indexes into each of the Flat File database text files, each SDBM file containing a different key (composed of a: single field, single partial field, or a compound key of multiple single and/or partial fields concatenated together). An alternate key with duplicates can be created in the SDBM files by making as part of the key, an incremented number perhaps in the range 1-9999. 
    2627        Key example:    LastName|IncNbr(perhaps in range 1-9999)
    2728                        "Williams|1" ... "Williams|5745". 
     
    3940FLAT FILE/SDBM, or MDB, relational database systems can employ file naming convention to make it easy for a DB application user-interface to determine which file(s) to look in. Example:  A flat file named US_CENSUS_2010_TX_A.txt (or .mdb for MS-Access) would be one way to identify a file logically segregated to contain only data associated with Texas citizens whose last name began with the letter "A".  A business would need to determine what logical segregation of data made the most sense for their operational needs. Server-side batch EDIT operations and heavy reporting could be performed during off hours. For common data statistics, a statistics table could be maintained (Server-side during off hours) which answered most user questions which would be an aggregate of the data across the entire database system (as in: Stats for the entire U.S., and Stats for each individual State of the 50 States - from the example given above).
    4041
    41 For a discussion on this topic, and sample Perl code, go to
    42      http://www.perlmonks.org/?node=joint+database+technology
     42For a discussion on this topic:     http://www.perlmonks.org/?node=joint+database+technology
     43
     44       #-- This Perl program retrieves 5 verses of King James Version Bible text
     45       #-- from a large Flat File (with fixed-length, "text" records) by random access lookup.
     46       #-- The Flat File contains 180 complete copies of the KJV Bible, with a bogus
     47       #-- translation number (tr) assigned to each Bible copy (tr = 1 to 180)
     48       #-- to make a unique key: {translation_nbr + book_nbr + chapter_nbr + verse_nbr}.
     49       #-- Record offsets (in bytes) are persistently stored in a binary Perl SDBM database file,
     50       #-- of key/value pairs, tied to a program hash table. The value is the offset.
     51       #-- The key is {tr + bk + chp + ver} numbers combined/concatenated.
     52       #-- If $offset is a negative value, seek from BOTTOM/END of file.
     53       #-- If $offset is a positive value, seek from BEGIN/TOP of file.
     54       #-- Each Bible contains 31102 verses of text, of max length 528 charater each.
     55       #-- But with the compound index {tr + bk + chp + ver} added to the Bible text,
     56       #--       for the purpose of proving the random access is working, and
     57       #-- MIMEbase64 encoding applied (to hide the Bible text), the fixed
     58       #-- length records have become 760 characters each. Decoding will occur as records are read.
     59       #-- The Flat File is just under 4 GIG. The SDBM file just under 1 GIG.
     60       #-- There are over 5 Million records each, in both the Flat File and SDBM file.
     61       #--    [180 copies of the Bible times 31102 verses per Bible]
     62       #-- Flat File, random access, record lookup, is instantaneous.
     63       #-- You can use Perl Portable Code: sysopen, syswrite, sysseek, sysread.
     64       #-- But the below example is Windows O/S specific Perl Code.
     65       #-- This example is a batch application process (no user front-end), having 5 hard-coded lookup keys.
     66       #-- You can build a user-interface to instead accept the lookup keys from user input: either typed in,
     67       #-- or selected from a GUI widget of preloaded values {tr, bk, chp, ver}.
     68       #-- A RANGE of values could even be selected to print Bible verses for an entire Book (ex. tr="134", bk="01" i.e. Genesis)
     69
     70       use Win32API::File 0.08 qw( :ALL );
     71       use Win32;
     72       use SDBM_File;
     73       use Fcntl;
     74       use MIME::Base64 qw(decode_base64);   
     75 
     76       $PWD=Win32::GetCwd();  #-- working directory
     77       
     78       #--- tie the external binary SDBM file contents (of key/value pairs) to a program hash table.
     79       tie( %BibleVersesIDX, "SDBM_File", '.\BibleFlatFile_760_31102_180_IDX', O_RDONLY, 0444 );
     80
     81       if (tied %BibleVersesIDX) { print "BibleVersesIDX Hash now tied to external SDBM file\n\n"; }   
     82       else { print "Could not tie BibleVersesIDX Hash with external SDBM file - Aborting\n\n";  die; }
     83       
     84       #-- create a file handle to: open, and random access read from, the Flat File of Bible verses.
     85       #-- the flat file already exists, and was preloaded with 5 million plus records.
     86       #-- if you are unfamiliar with the next line of code, it would be deceiving to you.
     87       $hFILE = createFile("$PWD\\BibleFlatFile_760_31102_180.dat", "r");  #-- $hFile is a native Windows file handle
     88                 
     89       #--           tr bk chp ver
     90       foreach $key ("00101001001", "09066022021", "09101001001", "18001001001", "18066022021") { 
     91             $offset=$BibleVersesIDX{$key};
     92             if ($offset < 0) {
     93                 $pos=SetFilePointer( $hFILE, $offset, [], FILE_END);   #-- moves the file pointer to a specific record at $offset 
     94             } else {
     95                 $pos=SetFilePointer( $hFILE, $offset, [], FILE_BEGIN); #-- moves the file pointer to a specific record at $offset     
     96             }
     97             #-- FYI: Don't rely on $pos, because if location is past 2 GIG mark, $pos (the return value) is wrong.
     98             #-- $offset will be an integer value to seek up to 2 GIG bytes from Top, or Bottom, of a 4 GIG file.
     99 
     100             ReadFile( $hFILE, $Buf, 760, [], [] );  #-- $Buf contains the 760 characters read in from the Flat File
     101             $decoded_Buf=decode_base64($Buf);   #-- MIMEBASE64 decoded to length 570 from 760
     102             $decoded_Buf=~s/ *$//;              #-- remove trailing spaces
     103             print $decoded_Buf . "\n\n";        #-- print to the screen the decoded Bible verse fetched from the Flat File
     104       }
     105       exit;
     106       END {
     107          CloseHandle( $hFILE );
     108          untie( %BibleVersesIDX );
     109          sleep 5;
     110       }
    43111
    44112=== Installation
Note: See TracChangeset for help on using the changeset viewer.