Changeset 25555 in project


Ignore:
Timestamp:
11/23/11 13:32:16 (9 years ago)
Author:
Alaric Snell-Pym
Message:

ugarit: Many minor improvements to crash safety (see README.txt)

Location:
release/4/ugarit/trunk
Files:
1 deleted
7 edited

Legend:

Unmodified
Added
Removed
  • release/4/ugarit/trunk/README.txt

    r25528 r25555  
    827827## Backends
    828828
     829* Carefully document backend API for other backend authors: in
     830  particular note behaviour in crash situations - we assume that after
     831  a succesful flush! all previous blocks are safe, but after a flush,
     832  if some blocks make it, then all previous blocks must have. Eg,
     833  writes are done in order and periodically auto-flushed, in
     834  effect. This invariant is required for the file-cache to be safe
     835  (see v1.0.2).
     836
     837* Make splitlog see if writing the block will go over the log file
     838  size limit, and if so, start a new file - rather than testing AFTER
     839  writing, leading to a potential extra partial block beyond the
     840  limit. It'd be nice to make it a hard limit.
     841
     842* Implement lock-tag! etc. in backend-fs, as a precaution against two
     843  concurrent snapshots racing over updating the tag, where concurrent
     844  access to the archive is even possible.
     845
     846* Lock the archive for writing in backend-splitlog, so that two
     847  snapshots to the same archive don't collide. Do we lock per `put!`
     848  to allow interleaving, or is that too inefficient? In which case, we
     849  need to hold a lock that persists for a while, and release it
     850  periodically to allow other writers to the same archive to have a
     851  chance.
     852
     853* Make backend-splitlog write the current log file offset as well as
     854  number into the metadata on each flush, and on startup, either
     855  truncate the file to that position (to remove anything written but
     856  not flushed to the metadata) or scan the log onwards from that point
     857  to find (complete) blocks that did not get flushed to the metadata.
     858
     859* Make `lock-tag!` in backend-splitlog actually block until the tag is
     860  not already locked! With a timeout and an apologetic error message
     861  if it takes too long.
     862
    829863* Extend the backend protocol with a special "admin" command that
    830864  allows for arbitrary backend-specific operations, and write an
     
    833867  should be an alist which is displayed to the user in a friendly
    834868  manner, as "Key: Value\n" lines.
    835 
    836 * Extend the backend protocol with a `flush` command, such that
    837   operations performed without a subsequent `flush` might not "stick" in
    838   failure cases (make `close!` have an implicit `flush`, of
    839   course). Use this to force an immediate commit in backends that use
    840   sqlite, as well as the current practice of committing at close, tag
    841   operations, and whenever it seems like a while has passed since the
    842   last time. Make the frontend flush at crucial points.
    843869
    844870* Implement "info" admin commands for all backends, that list any
     
    932958## Core
    933959
     960* Make `fold-archive-node`'s listing of tags at the top level report
     961  the lock status of the tags.
     962
     963* More stats. Log bytes written AFTER compression and encryption in
     964  `archive-put!`. Log snapshot start and end times in the snapshot
     965  object.
     966
    934967* SIGINFO support. Add a SIGINFO handler that sets a flag, and make
    935968  the `store-file!` and `store-directory!` main loops look for the
     
    9921025  ACLs/streams, FAT attributes, etc... Ben says to look at Box Backup
    9931026  for some code to do that sort of thing.
    994 
    995 * Implement lock-tag! etc. in backend-fs, as a precaution against two
    996   concurrent snapshots racing over updating the tag, where concurrent
    997   access to the archive is even possible.
    9981027
    9991028* Deletion support - letting you remove snapshots. Perhaps you might
     
    10531082## Front-end
    10541083
     1084* Add a command to force removing a tag lock.
     1085
     1086* Add a command to list all the tags (with a * next to locked tags)
     1087
    10551088* Better error messages
    10561089
     
    11731206  metadata. Switched to the `posix-extras` egg and ditched our own
    11741207  `posixextras.scm` wrappers. Used the `parley` egg in the `ugarit
    1175   explore` CLI for line editing.
     1208  explore` CLI for line editing. BUGFIX: Made file cache check the
     1209  file hashes it finds in the cache actually exist in the archive, to
     1210  protect against the case where a crash of some kind has caused
     1211  unflushed changes to be lost; the file cache may well have committed
     1212  changes that the backend hasn't, leading to references to
     1213  nonexistant blocks. Note that we assume that archives are
     1214  sequentially safe, eg if the final indirect block of a large file
     1215  made it, all the partial blocks must have made it too. BUGFIX: Added
     1216  an explicit `flush!` command to the backend protocol, and put
     1217  explicit flushes at critical points in higher layers
     1218  (`backend-cache`, the archive abstraction in the Ugarit core, and
     1219  when tagging a snapshot) so that we ensure the blocks we point at
     1220  are flushed before committing references to them in the
     1221  `backend-cache` or file caches, or into tags, to ensure crash safety.
    11761222
    11771223* 1.0.1: Consistency check on read blocks by default. Removed warning
  • release/4/ugarit/trunk/backend-cache.scm

    r25527 r25555  
    2424   (define *updates-since-last-commit* 0)
    2525   (define (flush!)
    26      (exec (sql *db* "COMMIT;"))
    27      (exec (sql *db* "BEGIN;"))
    28      (set! *updates-since-last-commit* 0))
     26     (when (> *updates-since-last-commit* 0)
     27      (exec (sql *db* "COMMIT;"))
     28      (exec (sql *db* "BEGIN;"))
     29      (set! *updates-since-last-commit* 0)))
    2930   (define (maybe-flush!)
    3031     (inc! *updates-since-last-commit*)
    3132     (when (> *updates-since-last-commit* commit-interval)
     33           ((storage-flush! be))
    3234           (flush!)))
    3335
     
    5961            (cache-set! key type)
    6062            (void)))
     63      (lambda ()                        ; flush!
     64        (begin
     65          ((storage-flush! be))
     66          (flush!)
     67          (void)))
    6168      (lambda (key) ; exists?
    6269         (or
     
    7683      (lambda (tag key) ; set-tag!
    7784        ((storage-set-tag! be) tag key)
     85        ((storage-flush! be))
    7886        (flush!))
    7987      (lambda (tag) ; tag
     
    8391      (lambda (tag) ; remove-tag!
    8492         ((storage-remove-tag! be) tag)
     93         ((storage-flush! be))
    8594         (flush!))
    8695      (lambda (tag) ; lock-tag!
    8796         ((storage-lock-tag! be) tag)
     97         ((storage-flush! be))
    8898         (flush!))
    8999      (lambda (tag) ; tag-locked?
     
    91101      (lambda (tag) ; unlock-tag!
    92102         ((storage-unlock-tag! be) tag)
    93          (flush!))
     103          ((storage-flush! be))
     104          (flush!))
    94105      (lambda () ; close!
    95          ((begin
    96             (exec (sql *db* "COMMIT;"))
    97             (close-database *db*)
    98             (storage-close! be))))))
     106        (begin
     107          ((storage-close! be))
     108          (exec (sql *db* "COMMIT;"))
     109          (close-database *db*)))))
    99110
    100111
  • release/4/ugarit/trunk/backend-devtools.scm

    r20740 r25555  
    66      (lambda (key data type) ; put!
    77         ((storage-put! be) key data type))
     8      (lambda () ; flush!
     9        ((storage-flush! be)))
    810      (lambda (key) ; exists?
    911         ((storage-exists? be) key))
     
    3840      (lambda (key data type) ; put!
    3941         ((storage-put! be) key data type))
     42      (lambda () ; flush!
     43        ((storage-flush! be)))
    4044      (lambda (key) ; exists?
    4145         ((storage-exists? be) key))
     
    7276            (printf "~A: (put! ~A ~A ~A)\n" name key data type)
    7377            ((storage-put! be) key data type)))
     78
     79      (lambda () ; flush!
     80        (begin
     81          (printf "~A: (flush!)\n" name)
     82          ((storage-flush! be))))
     83
    7484      (lambda (key) ; exists?
    7585         (let ((result ((storage-exists? be) key)))
  • release/4/ugarit/trunk/backend-fs.scm

    r25527 r25555  
    7676               (rename-file (make-name key ".type~") (make-name key ".type"))
    7777               (void))))
     78      (lambda () (void)) ; flush! - a no-op for us
    7879      (lambda (key) ; exists?
    7980         (if (file-read-access? (make-name key ".data"))
     
    211212         (*updates-since-last-commit* 0)
    212213         (flush! (lambda ()
    213                    (set-metadata "current-logfile" (number->string *logcount*))
    214                    (exec (sql *db* "COMMIT;"))
    215                    (exec (sql *db* "BEGIN;"))
    216                    (set! *updates-since-last-commit* 0)))
     214                   (when (> *updates-since-last-commit* 0)
     215                    (set-metadata "current-logfile" (number->string *logcount*))
     216                    (exec (sql *db* "COMMIT;"))
     217                    (exec (sql *db* "BEGIN;"))
     218                    (set! *updates-since-last-commit* 0))))
    217219         (maybe-flush! (lambda ()
    218220                         (inc! *updates-since-last-commit*)
     
    286288             (set-block-data! key type *logcount* (+ (string-length header) posn) (u8vector-length data))
    287289             (void)))
     290
     291         (lambda ()                     ; flush!
     292           (flush!)
     293           (void))
    288294
    289295         (lambda (key) ; exists?
  • release/4/ugarit/trunk/test/run.scm

    r25527 r25555  
    478478                                                     (cons snapshot acc))
    479479                                                   '()))
    480                  (pp result)
    481480                 (test-assert "History has expected form"
    482481                              (match result
     
    505504                 (test-define-values "Walk the history of tag 'Test' with fold-archive-node" (tag)
    506505                                     (fold-archive-node a '(tag . "Test") (lambda (name dirent acc) (cons (cons name dirent) acc)) '()))
    507                  (pp tag)
    508506                 (test-assert "Tag history has expected form"
    509507                              (match tag
  • release/4/ugarit/trunk/ugarit-backend.scm

    r22228 r25555  
    66         storage-unlinkable?
    77         storage-put!
     8         storage-flush!
    89         storage-exists?
    910         storage-get
     
    3637  writable? ; Boolean: Can we call put!, link!, unlink!, set-tag!, lock-tag!, unlock-tag!?
    3738  unlinkable? ; Boolean: Can we call unlink?
    38   put! ; Procedure: (put key data type) - stores the data (u8vector) under the key (string) with the given type tag (symbol) and a refcount of 1. Does nothing of the key is already in use.
     39  put! ; Procedure: (put! key data type) - stores the data (u8vector) under the key (string) with the given type tag (symbol) and a refcount of 1. Does nothing of the key is already in use.
     40  flush! ; Procedure: (flush!) - all previous changes must be flushed to disk by the time the continuation is applied.
    3941  exists? ; Procedure: (exists? key) - returns the type of the block with the given key if it exists, or #f otherwise
    4042  get ; Procedure: (get key) - returns the contents (u8vector) of the block with the given key (string) if it exists, or #f otherwise
     
    99101               (write #t)))
    100102            (loop))
     103
     104           (('flush!)
     105            (with-error-reporting
     106             ((storage-flush! storage))
     107             (write #t))
     108            (loop))
    101109
    102110           (('exists? key)
     
    229237              (void))
    230238
     239            (lambda ()                  ; flush!
     240              (if debug (printf "~a: flush!" command-line))
     241              (write `(flush!) commands)
     242              (read-response responses)
     243              (void))
     244
    231245            (lambda (key)               ; exists?
    232246              (if debug (printf "~a: exists?" command-line))
  • release/4/ugarit/trunk/ugarit-core.scm

    r25528 r25555  
    1515         archive-get
    1616         archive-put!
     17         archive-flush!
    1718         archive-remove-tag!
    1819         archive-set-tag!
     
    5556         extract-directory!
    5657         extract-object!
     58         ; FIXME: These two will be useful in future
     59         ;verify-directory!
     60         ;verify-object!
    5761         snapshot-directory-tree!
    5862         tag-snapshot!
     
    146150(define (file-cache-put! archive file-path mtime size key)
    147151  (when (> file-cache-commit-interval (archive-file-cache-updates-uncommitted archive))
     152        ((storage-flush! (archive-storage archive))) ; Flush the storage before we commit our cache, for crash safety
    148153        (exec (sql (archive-file-cache archive) "commit;"))
    149154        (exec (sql (archive-file-cache archive) "begin;"))
     
    358363  (void))
    359364
     365(define (archive-flush! archive)
     366  ((storage-flush! (archive-storage archive))) ; Flush the storage first, to ensure crash safety
     367  (when (archive-file-cache archive)
     368        (exec (sql (archive-file-cache archive) "commit;"))
     369        (exec (sql (archive-file-cache archive) "begin;"))
     370        (set! (archive-file-cache-updates-uncommitted archive) 0)))
     371
    360372(define (archive-exists? archive key)
    361373  ((storage-exists? (archive-storage archive)) key))
     
    412424
    413425(define (archive-close! archive)
     426  ((storage-close! (archive-storage archive))) ;; This flushes the backend before we flush the file cache, for crash safety
    414427  (when (archive-file-cache archive)
    415428        (exec (sql (archive-file-cache archive) "commit;"))
    416429        (close-database (archive-file-cache archive)))
    417   ((storage-close! (archive-storage archive))))
     430  (void))
    418431
    419432;;
     
    626639               (size (vector-ref file-stat 5))
    627640               (cache-result (file-cache-get archive file-path mtime size)))
    628           (if cache-result ;; FIXME: This assumes that the cached file IS in the archive. Give a configurable option to make it check this, making the file-cache a file hash cache rather than also being an archive presence cache like backend-cache as well, for safety.
     641          (if (and cache-result (archive-exists? archive cache-result))
    629642              (begin
    630643                (inc! (archive-file-cache-hits archive))
     
    10851098    (let-values (((snapshot-key snapshot-reused?)
    10861099                  (store-sexpr! archive snapshot 'snapshot keys)))
    1087       (archive-set-tag! archive tag snapshot-key)
     1100      (archive-flush! archive) ; After this point we can be sure that the snapshot and all blocks it refers to are stably stored
     1101      (archive-set-tag! archive tag snapshot-key) ; Therefore, we can be confident in saving it in a tag.
    10881102      (archive-unlock-tag! archive tag)
    10891103      snapshot-key)))
Note: See TracChangeset for help on using the changeset viewer.