Changeset 25555 in project

11/23/11 13:32:16 (9 years ago)
Alaric Snell-Pym

ugarit: Many minor improvements to crash safety (see README.txt)

1 deleted
7 edited


  • release/4/ugarit/trunk/README.txt

    r25528 r25555  
    827827## Backends
     829* Carefully document backend API for other backend authors: in
     830  particular note behaviour in crash situations - we assume that after
     831  a succesful flush! all previous blocks are safe, but after a flush,
     832  if some blocks make it, then all previous blocks must have. Eg,
     833  writes are done in order and periodically auto-flushed, in
     834  effect. This invariant is required for the file-cache to be safe
     835  (see v1.0.2).
     837* Make splitlog see if writing the block will go over the log file
     838  size limit, and if so, start a new file - rather than testing AFTER
     839  writing, leading to a potential extra partial block beyond the
     840  limit. It'd be nice to make it a hard limit.
     842* Implement lock-tag! etc. in backend-fs, as a precaution against two
     843  concurrent snapshots racing over updating the tag, where concurrent
     844  access to the archive is even possible.
     846* Lock the archive for writing in backend-splitlog, so that two
     847  snapshots to the same archive don't collide. Do we lock per `put!`
     848  to allow interleaving, or is that too inefficient? In which case, we
     849  need to hold a lock that persists for a while, and release it
     850  periodically to allow other writers to the same archive to have a
     851  chance.
     853* Make backend-splitlog write the current log file offset as well as
     854  number into the metadata on each flush, and on startup, either
     855  truncate the file to that position (to remove anything written but
     856  not flushed to the metadata) or scan the log onwards from that point
     857  to find (complete) blocks that did not get flushed to the metadata.
     859* Make `lock-tag!` in backend-splitlog actually block until the tag is
     860  not already locked! With a timeout and an apologetic error message
     861  if it takes too long.
    829863* Extend the backend protocol with a special "admin" command that
    830864  allows for arbitrary backend-specific operations, and write an
    833867  should be an alist which is displayed to the user in a friendly
    834868  manner, as "Key: Value\n" lines.
    836 * Extend the backend protocol with a `flush` command, such that
    837   operations performed without a subsequent `flush` might not "stick" in
    838   failure cases (make `close!` have an implicit `flush`, of
    839   course). Use this to force an immediate commit in backends that use
    840   sqlite, as well as the current practice of committing at close, tag
    841   operations, and whenever it seems like a while has passed since the
    842   last time. Make the frontend flush at crucial points.
    844870* Implement "info" admin commands for all backends, that list any
    932958## Core
     960* Make `fold-archive-node`'s listing of tags at the top level report
     961  the lock status of the tags.
     963* More stats. Log bytes written AFTER compression and encryption in
     964  `archive-put!`. Log snapshot start and end times in the snapshot
     965  object.
    934967* SIGINFO support. Add a SIGINFO handler that sets a flag, and make
    935968  the `store-file!` and `store-directory!` main loops look for the
    9921025  ACLs/streams, FAT attributes, etc... Ben says to look at Box Backup
    9931026  for some code to do that sort of thing.
    995 * Implement lock-tag! etc. in backend-fs, as a precaution against two
    996   concurrent snapshots racing over updating the tag, where concurrent
    997   access to the archive is even possible.
    9991028* Deletion support - letting you remove snapshots. Perhaps you might
    10531082## Front-end
     1084* Add a command to force removing a tag lock.
     1086* Add a command to list all the tags (with a * next to locked tags)
    10551088* Better error messages
    11731206  metadata. Switched to the `posix-extras` egg and ditched our own
    11741207  `posixextras.scm` wrappers. Used the `parley` egg in the `ugarit
    1175   explore` CLI for line editing.
     1208  explore` CLI for line editing. BUGFIX: Made file cache check the
     1209  file hashes it finds in the cache actually exist in the archive, to
     1210  protect against the case where a crash of some kind has caused
     1211  unflushed changes to be lost; the file cache may well have committed
     1212  changes that the backend hasn't, leading to references to
     1213  nonexistant blocks. Note that we assume that archives are
     1214  sequentially safe, eg if the final indirect block of a large file
     1215  made it, all the partial blocks must have made it too. BUGFIX: Added
     1216  an explicit `flush!` command to the backend protocol, and put
     1217  explicit flushes at critical points in higher layers
     1218  (`backend-cache`, the archive abstraction in the Ugarit core, and
     1219  when tagging a snapshot) so that we ensure the blocks we point at
     1220  are flushed before committing references to them in the
     1221  `backend-cache` or file caches, or into tags, to ensure crash safety.
    11771223* 1.0.1: Consistency check on read blocks by default. Removed warning
  • release/4/ugarit/trunk/backend-cache.scm

    r25527 r25555  
    2424   (define *updates-since-last-commit* 0)
    2525   (define (flush!)
    26      (exec (sql *db* "COMMIT;"))
    27      (exec (sql *db* "BEGIN;"))
    28      (set! *updates-since-last-commit* 0))
     26     (when (> *updates-since-last-commit* 0)
     27      (exec (sql *db* "COMMIT;"))
     28      (exec (sql *db* "BEGIN;"))
     29      (set! *updates-since-last-commit* 0)))
    2930   (define (maybe-flush!)
    3031     (inc! *updates-since-last-commit*)
    3132     (when (> *updates-since-last-commit* commit-interval)
     33           ((storage-flush! be))
    3234           (flush!)))
    5961            (cache-set! key type)
    6062            (void)))
     63      (lambda ()                        ; flush!
     64        (begin
     65          ((storage-flush! be))
     66          (flush!)
     67          (void)))
    6168      (lambda (key) ; exists?
    6269         (or
    7683      (lambda (tag key) ; set-tag!
    7784        ((storage-set-tag! be) tag key)
     85        ((storage-flush! be))
    7886        (flush!))
    7987      (lambda (tag) ; tag
    8391      (lambda (tag) ; remove-tag!
    8492         ((storage-remove-tag! be) tag)
     93         ((storage-flush! be))
    8594         (flush!))
    8695      (lambda (tag) ; lock-tag!
    8796         ((storage-lock-tag! be) tag)
     97         ((storage-flush! be))
    8898         (flush!))
    8999      (lambda (tag) ; tag-locked?
    91101      (lambda (tag) ; unlock-tag!
    92102         ((storage-unlock-tag! be) tag)
    93          (flush!))
     103          ((storage-flush! be))
     104          (flush!))
    94105      (lambda () ; close!
    95          ((begin
    96             (exec (sql *db* "COMMIT;"))
    97             (close-database *db*)
    98             (storage-close! be))))))
     106        (begin
     107          ((storage-close! be))
     108          (exec (sql *db* "COMMIT;"))
     109          (close-database *db*)))))
  • release/4/ugarit/trunk/backend-devtools.scm

    r20740 r25555  
    66      (lambda (key data type) ; put!
    77         ((storage-put! be) key data type))
     8      (lambda () ; flush!
     9        ((storage-flush! be)))
    810      (lambda (key) ; exists?
    911         ((storage-exists? be) key))
    3840      (lambda (key data type) ; put!
    3941         ((storage-put! be) key data type))
     42      (lambda () ; flush!
     43        ((storage-flush! be)))
    4044      (lambda (key) ; exists?
    4145         ((storage-exists? be) key))
    7276            (printf "~A: (put! ~A ~A ~A)\n" name key data type)
    7377            ((storage-put! be) key data type)))
     79      (lambda () ; flush!
     80        (begin
     81          (printf "~A: (flush!)\n" name)
     82          ((storage-flush! be))))
    7484      (lambda (key) ; exists?
    7585         (let ((result ((storage-exists? be) key)))
  • release/4/ugarit/trunk/backend-fs.scm

    r25527 r25555  
    7676               (rename-file (make-name key ".type~") (make-name key ".type"))
    7777               (void))))
     78      (lambda () (void)) ; flush! - a no-op for us
    7879      (lambda (key) ; exists?
    7980         (if (file-read-access? (make-name key ".data"))
    211212         (*updates-since-last-commit* 0)
    212213         (flush! (lambda ()
    213                    (set-metadata "current-logfile" (number->string *logcount*))
    214                    (exec (sql *db* "COMMIT;"))
    215                    (exec (sql *db* "BEGIN;"))
    216                    (set! *updates-since-last-commit* 0)))
     214                   (when (> *updates-since-last-commit* 0)
     215                    (set-metadata "current-logfile" (number->string *logcount*))
     216                    (exec (sql *db* "COMMIT;"))
     217                    (exec (sql *db* "BEGIN;"))
     218                    (set! *updates-since-last-commit* 0))))
    217219         (maybe-flush! (lambda ()
    218220                         (inc! *updates-since-last-commit*)
    286288             (set-block-data! key type *logcount* (+ (string-length header) posn) (u8vector-length data))
    287289             (void)))
     291         (lambda ()                     ; flush!
     292           (flush!)
     293           (void))
    289295         (lambda (key) ; exists?
  • release/4/ugarit/trunk/test/run.scm

    r25527 r25555  
    478478                                                     (cons snapshot acc))
    479479                                                   '()))
    480                  (pp result)
    481480                 (test-assert "History has expected form"
    482481                              (match result
    505504                 (test-define-values "Walk the history of tag 'Test' with fold-archive-node" (tag)
    506505                                     (fold-archive-node a '(tag . "Test") (lambda (name dirent acc) (cons (cons name dirent) acc)) '()))
    507                  (pp tag)
    508506                 (test-assert "Tag history has expected form"
    509507                              (match tag
  • release/4/ugarit/trunk/ugarit-backend.scm

    r22228 r25555  
    66         storage-unlinkable?
    77         storage-put!
     8         storage-flush!
    89         storage-exists?
    910         storage-get
    3637  writable? ; Boolean: Can we call put!, link!, unlink!, set-tag!, lock-tag!, unlock-tag!?
    3738  unlinkable? ; Boolean: Can we call unlink?
    38   put! ; Procedure: (put key data type) - stores the data (u8vector) under the key (string) with the given type tag (symbol) and a refcount of 1. Does nothing of the key is already in use.
     39  put! ; Procedure: (put! key data type) - stores the data (u8vector) under the key (string) with the given type tag (symbol) and a refcount of 1. Does nothing of the key is already in use.
     40  flush! ; Procedure: (flush!) - all previous changes must be flushed to disk by the time the continuation is applied.
    3941  exists? ; Procedure: (exists? key) - returns the type of the block with the given key if it exists, or #f otherwise
    4042  get ; Procedure: (get key) - returns the contents (u8vector) of the block with the given key (string) if it exists, or #f otherwise
    99101               (write #t)))
    100102            (loop))
     104           (('flush!)
     105            (with-error-reporting
     106             ((storage-flush! storage))
     107             (write #t))
     108            (loop))
    102110           (('exists? key)
    229237              (void))
     239            (lambda ()                  ; flush!
     240              (if debug (printf "~a: flush!" command-line))
     241              (write `(flush!) commands)
     242              (read-response responses)
     243              (void))
    231245            (lambda (key)               ; exists?
    232246              (if debug (printf "~a: exists?" command-line))
  • release/4/ugarit/trunk/ugarit-core.scm

    r25528 r25555  
    1515         archive-get
    1616         archive-put!
     17         archive-flush!
    1718         archive-remove-tag!
    1819         archive-set-tag!
    5556         extract-directory!
    5657         extract-object!
     58         ; FIXME: These two will be useful in future
     59         ;verify-directory!
     60         ;verify-object!
    5761         snapshot-directory-tree!
    5862         tag-snapshot!
    146150(define (file-cache-put! archive file-path mtime size key)
    147151  (when (> file-cache-commit-interval (archive-file-cache-updates-uncommitted archive))
     152        ((storage-flush! (archive-storage archive))) ; Flush the storage before we commit our cache, for crash safety
    148153        (exec (sql (archive-file-cache archive) "commit;"))
    149154        (exec (sql (archive-file-cache archive) "begin;"))
    358363  (void))
     365(define (archive-flush! archive)
     366  ((storage-flush! (archive-storage archive))) ; Flush the storage first, to ensure crash safety
     367  (when (archive-file-cache archive)
     368        (exec (sql (archive-file-cache archive) "commit;"))
     369        (exec (sql (archive-file-cache archive) "begin;"))
     370        (set! (archive-file-cache-updates-uncommitted archive) 0)))
    360372(define (archive-exists? archive key)
    361373  ((storage-exists? (archive-storage archive)) key))
    413425(define (archive-close! archive)
     426  ((storage-close! (archive-storage archive))) ;; This flushes the backend before we flush the file cache, for crash safety
    414427  (when (archive-file-cache archive)
    415428        (exec (sql (archive-file-cache archive) "commit;"))
    416429        (close-database (archive-file-cache archive)))
    417   ((storage-close! (archive-storage archive))))
     430  (void))
    626639               (size (vector-ref file-stat 5))
    627640               (cache-result (file-cache-get archive file-path mtime size)))
    628           (if cache-result ;; FIXME: This assumes that the cached file IS in the archive. Give a configurable option to make it check this, making the file-cache a file hash cache rather than also being an archive presence cache like backend-cache as well, for safety.
     641          (if (and cache-result (archive-exists? archive cache-result))
    629642              (begin
    630643                (inc! (archive-file-cache-hits archive))
    10851098    (let-values (((snapshot-key snapshot-reused?)
    10861099                  (store-sexpr! archive snapshot 'snapshot keys)))
    1087       (archive-set-tag! archive tag snapshot-key)
     1100      (archive-flush! archive) ; After this point we can be sure that the snapshot and all blocks it refers to are stably stored
     1101      (archive-set-tag! archive tag snapshot-key) ; Therefore, we can be confident in saving it in a tag.
    10881102      (archive-unlock-tag! archive tag)
    10891103      snapshot-key)))
Note: See TracChangeset for help on using the changeset viewer.