source: project/release/4/ugarit/trunk/README.txt @ 25477

Last change on this file since 25477 was 25477, checked in by Alaric Snell-Pym, 10 years ago

ugarit: Unit test suite now covers everything except fold-archive-node over directories (but that's really hard to test, and really simple to implement, so not worth testing, right?)

File size: 38.0 KB
1# Introduction
3Ugarit is a backup/archival system based around content-addressible storage.
5This allows it to upload incremental backups to a remote server or a local filesystem such as an NFS share or a removable hard disk, yet have the archive instantly able to produce a full snapshot on demand rather than needing to download a full snapshot plus all the incrementals since. The content-addressible storage technique means that the incrementals can be applied to a snapshot on various kinds of storage without needing intelligence in the storage itself - so the snapshots can live within Amazon S3 or on a removable hard disk.
7Also, the same storage can be shared between multiple systems that all back up to it - and the incremental upload algorithm will mean that any files shared between the servers will only need to be uploaded once. If you back up a complete server, than go and back up another that is running the same distribution, then all the files in `/bin` and so on that are already in the storage will not need to be backed up again; the system will automatically spot that they're already there, and not upload them again.
9## So what's that mean in practice?
11You can run Ugarit to back up any number of filesystems to a shared archive, and on every backup, Ugarit will only upload files or parts of files that aren't already in the archive - be they from the previous snapshot, earlier snapshots, snapshot of entirely unrelated filesystems, etc. Every time you do a snapshot, Ugarit builds an entire complete directory tree of the snapshot in the archive - but reusing any parts of files, files, or entire directories that already exist anywhere in the archive, and only uploading what doesn't already exist.
13The support for parts of files means that, in many cases, gigantic files like database tables and virtual disks for virtual machines will not need to be uploaded entirely every time they change, as the changed sections will be identified and uploaded.
15Because a complete directory tree exists in the archive for any snapshot, the extraction algorithm is incredibly simple - and, therefore, incredibly reliable and fast. Simple, reliable, and fast are just what you need when you're trying to reconstruct the filesystem of a live server.
17Also, it means that you can do lots of small snapshots. If you run a snapshot every hour, then only a megabyte or two might have changed in your filesystem, so you only upload a megabyte or two - yet you end up with a complete history of your filesystem at hourly intervals in the archive.
19Conventional backup systems usually either store a full backup then incrementals to their archives, meaning that doing a restore involves reading the full backup then reading every incremental since and applying them - so to do a restore, you have to download *every version* of the filesystem you've ever uploaded, or you have to do periodic full backups (even though most of your filesystem won't have changed since the last full backup) to reduce the number of incrementals required for a restore. Better results are had from systems that use a special backup server to look after the archive storage, which accept incremental backups and apply them to the snapshot they keep in order to maintain a most-recent snapshot that can be downloaded in a single run; but they then restrict you to using dedicated servers as your archive stores, ruling out cheap scalable solutions like Amazon S3, or just backing up to a removable USB or eSATA disk you attach to your system whenever you do a backup. And dedicated backup servers are complex pieces of software; can you rely on something complex for the fundamental foundation of your data security system?
21## System Requirements
23Ugarit should run on any POSIX-compliant system that can run [Chicken Scheme]( It stores and restores all the file attributes reported by the `stat` system call - POSIX mode permissions, UID, GID, mtime, and optionally atime and ctime (although the ctime cannot be restored due to POSIX restrictions). Ugarit will store files, directories, device and character special files, symlinks, and FIFOs.
25Support for extended filesystem attributes - ACLs, alternative streams, forks and other metadata - is possible, due to the extensible directory entry format; support for such metadata will be added as required.
27Currently, only local filesystem-based archive storage backends are complete: these are suitable for backing up to a removable hard disk or a filesystem shared via NFS or other protocols. They can also be used to snapshot to local disks, although this is obviously then vulnerable to local system failures; if the computer that's being backed up catches fire, you won't be able to restore it from archives that were also ruined!
29However, the next backend to be implemented will be one for Amazon S3, and an SFTP backend for storing archives anywhere you can ssh to. Other backends will be implemented on demand; an archive can, in principle, be stored on anything that can store files by name, report on whether a file already exists, and efficiently download a file by name. This rules out magnetic tapes due to their requirement for sequential access.
31Although we need to trust that a backend won't lose data (for now), we don't need to trust the backend not to snoop on us, as Ugarit optionally encrypts everything sent to the archive.
33## What's in an archive?
35An Ugarit archive contains a load of blocks, each up to a maximum size (usually 1MiB, although other backends might impose smaller limits). Each block is identified by the Tiger hash of its contents; this is how Ugarit avoids ever uploading the same data twice, by checking to see if the data to be uploaded already exists in the archive by looking up the hash. The contents of the blocks are compressed and then encrypted before upload.
37Every file uploaded is, unless it's small enough to fit in a single block, chopped into blocks, and each block uploaded. This way, the entire contents of your filesystem can be uploaded - or, at least, only the parts of it that aren't already there! The blocks are then tied together to create a snapshot by upload blocks full of the Tiger hashes of the data blocks, and directory blocks uploaded listing the names and attributes of files in directories, along with the hashes of the blocks that contain the files' contents. Even the blocks that contain lists of hashes of other blocks are subject to checking for pre-existence in the archive; if only a few MiB of your hundred-GiB filesystem has changed, then even the index blocks and directory blocks are re-used from previous snapshots.
39Once uploaded, a block in the archive is never again changed. After all, if its contents changed, its hash would change, so it would no longer be the same block! However, every block has a reference count, tracking the number of index blocks that refer to it. This means that the archive knows which blocks are shared between multiple snapshots (or shared *within* a snapshot - if a filesystem has more than one copy of the same file, still only one copy is uploaded), so that if a given snapshot is deleted, then the blocks that only that snapshot is using can be deleted to free up space, without corrupting other snapshots by deleting blocks they share. Bear in mind, however, that not all storage backends may support this - there are certain advantages to being an append-only archive. For a start, you can't delete something by accident! The supplied filesystem backend supports deletion, while the logfile backend does not. However, the actual deletion command hasn't been implemented yet either, so it's a moot point for now...
41Finally, the archive contains objects called tags. Unlike the blocks, the tags contents can change, and they have meaningful names rather than being identified by hash. Tags identify the top-level blocks of snapshots within the system, from which (by following the chain of hashes down through the index blocks) the entire contents of a snapshot may be found. Unless you happen to have recorded the hash of a snapshot somewhere, the tags are where you find snapshots from when you want to do a restore!
43Whenever a snapshot is taken, as soon as Ugarit has uploaded all the files, directories, and index blocks required, it looks up the tag you have identified as the target of the snapshot. If the tag already exists, then the snapshot it currently points to is recorded in the new snapshot as the "previous snapshot"; then the snapshot header containing the previous snapshot hash, along with the date and time and any comments you provide for the snapshot, and is uploaded (as another block, identified by its hash). The tag is then updated to point to the new snapshot.
45This way, each tag actually identifies a chronological chain of snapshots. Normally, you would use a tag to identify a filesystem being archived; you'd keep snapshotting the filesystem to the same tag, resulting in all the snapshots of that filesystem hanging from the tag. But if you wanted to remember any particular snapshot (perhaps if it's the snapshot you take before a big upgrade or other risky operation), you can duplicate the tag, in effect 'forking' the chain of snapshots much like a branch in a version control system.
47# Using Ugarit
49## Installation
51Install [Chicken Scheme]( using their [installation instructions](
53Ugarit can then be installed by typing (as root):
55    chicken-install ugarit
57See the [chicken-install manual]( for details if you have any trouble, or wish to install into your home directory.
59## Setting up an archive
61Firstly, you need to know the archive identifier for the place you'll be storing your archives. This depends on your backend.
63### Filesystem backend
65The filesystem backend creates archives by storing each block or tag in its own file, in a directory. To keep the objects-per-directory count down, it'll split the files into subdirectories.
67To set up a new filesystem-backend archive, just create an empty directory that Ugarit will have write access to when it runs. It will probably run as root in order to be able to access the contents of files that aren't world-readable (although that's up to you), so be careful of NFS mounts that have `maproot=nobody` set!
69You can then refer to it using the following archive identifier:
71      fs "...path to directory..."
73### New Logfile backend
75The logfile backend works much like the original Venti system. It's append-only - you won't be able to delete old snapshots from a logfile archive, even when I implement deletion. It stores the archive in two sets of files; one is a log of data blocks, split at a specified maximum size, and the other is the metadata: a GDBM file used as an index to locate blocks in the logfiles and to store the blocks' types, a GDBM file of tags, and a counter file used in naming logfiles.
77To set up a new logfile archive, just choose where to put the two sets of files. It would be nice to put the metadata on a different physical disk to the logs, to reduce seeking. Create a directory for each, or if you only have one disk, you can put them all in the same directory.
79You can then refer to it using the following archive identifier:
81      splitlog "...log directory..." "...metadata directory..." max-logfile-size
83For most platforms, a max-logfile-size of 900000000 (900 MB) should suffice. For now, don't go much bigger than that on 32-bit systems until Chicken's `file-position` function is fixed to work with files >1GB in size.
85### Old Logfile backend
87The old logfile backend works much like the original Venti system. It's append-only - you won't be able to delete old snapshots from a logfile archive, even when I implement deletion. It stores the archive in three files; one is a log of data blocks, one is a GDBM index that remembers where in the log each block resides, one is a GDBM of tags.
89This worked well, but exposed a bug in Chicken when dealing with files more than about a gigabyte on 32-bit platforms. I fixed that in short order, but it reminded me that some platforms don't like files larger than 2GB anyway, so I wrote a new logfile backend that splits the log file into chunks at a specified point. You probably want to use the new backend - the old backend is kept for compatability only.
91To set up an old logfile archive, just choose where to put the three files. It would be nice to put the index and tags on a different physical disk to the log, to reduce seeking.
93You can then refer to it using the following archive identifier:
95      log "...logfile..." "...indexfile..." "...tagsfile..."
97Neither of the files need to exist in advance; Ugarit will create them.
99## Writing a ugarit.conf
101`ugarit.conf` should look something like this:
103      (storage <archive identifier>)
104      (hash tiger "<A secret string>")
105      [(compression [deflate|lzma])]
106      [(encryption aes <key>)]
107      [(file-cache "<path>")]
108      [(rule ...)]
110The hash line chooses a hash algorithm. Currently Tiger-192 (`tiger`), SHA-256 (`sha256`), SHA-384 (`sha384`) and SHA-512 (`sha512`) are supported; if you omit the line then Tiger will still be used, but it will be a simple hash of the block with the block type appended, which reveals to attackers what blocks you have (as the hash is of the unencrypted block, and the hash is not encrypted). This is useful for development and testing or for use with trusted archives, but not advised for use with archives that attackers may snoop at. Providing a secret string produces a hash function that hashes the block, the type of block, and the secret string, producing hashes that attackers who can snoop the archive cannot use to find known blocks. Whichever hash function you use, you will need to install the required Chicken egg with one of the following commands:
112    sudo chicken-install tiger-hash  # for tiger
113    sudo chicken-install sha2        # for the SHA hashes
115`lzma` is the recommended compression option for low-bandwidth backends or when space is tight, but it's very slow to compress; deflate or no compression at all are better for fast local archives. To have no compression at all, just remove the `(compression ...)` line entirely. Likewise, to use compression, you need to install a Chicken egg:
117       sudo chicken-install z3       # for deflate
118       sudo chicken-install lzma     # for lzma
120Likewise, the `(encryption ...)` line may be omitted to have no encryption; the only currently supported algorithm is aes (in CBC mode) with a key given in hex, as a passphrase (hashed to get a key), or a passphrase read from the terminal on every run. The key may be 16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a hex key, just supply it as a string, like so:
122      (encryption aes "00112233445566778899AABBCCDDEEFF")
124...for 128-bit AES,
126      (encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")
128...for 192-bit AES, or
130      (encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
132...for 256-bit AES.
134Alternatively, you can provide a passphrase, and specify how large a key you want it turned into, like so:
136      (encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
138Finally, the extra-paranoid can request that Ugarit prompt for a passphrase on every run and hash it into a key of the specified length, like so:
140      (encryption aes ([16|24|32] prompt))
142(note the lack of quotes around `prompt`, distinguishing it from a passphrase)
144Again, as it is an optional feature, to use encryption, you must install the appropriate Chicken egg:
146       sudo chicken-install aes
148A file cache, if enabled, significantly speeds up subsequent snapshots of a filesystem tree. The file cache is a file (which Ugarit will create if it doesn't already exist) mapping filenames to (mtime,hash) pairs; as it scans the filesystem, if it files a file in the cache and the mtime has not changed, it will assume it is already archived under the specified hash. This saves it from having to read the entire file to hash it and then check if the hash is present in the archive. In other words, if only a few files have changed since the last snapshot, then snapshotting a directory tree becomes an O(N) operation, where N is the number of files, rather than an O(M) operation, where M is the total size of files involved.
150For example:
152      (storage splitlog "/net/spiderman/archive/logs" "/net/spiderman/archive/index" 900000000)
153      (hash tiger "Giung0ahKahsh9ahphu5EiGhAhth4eeyDahs2aiWAlohr6raYeequ8uiUr3Oojoh")
154      (encryption aes (32 "deing2Aechediequohdo6Thuvu0OLoh6fohngio9koush9euX6el9iesh6Aef4augh3WiY7phahmesh2Theeziniem5hushai5zigushohnah1quae1ooXo0eingu1Aifeo1eeSheaz9ieSie9tieneibeiPho0quu6um8weiyagh4kaeshooThooNgeyoul2Ahsahgh8imohw3hoyazai9gaph5ohhaechiedeenusaeghahghipe8ii3oo9choh5cieth5iev3jiedohquai4Thiedah5sah5kohcepheixai3aiPainozooc6zohNeiy6Jeigeesie5eithoo0ciiNae8Nee3eiSuKaiza0VaiPai2eeFooNgeengaif9yaiv9rathuoQuohy0ohth6OiL9aisaetheeWoh9aiQu0yoo6aequ3quoiChi7joonohwuvaipeuh2eiPoogh1Ie8tiequesoshaeBue5ieca8eerah0quieJoNoh3Jiesh1chei8weidixeen1yah1ioChie0xaimahWeeriex5eetiichahP9iey5ux7ahGhei7eejahxooch5eiqu0Pheir9Reiri4ahqueijuchae8eeyieMeixa4ciisioloe9oaroof1eegh4idaeNg5aepeip8mah7ixaiSohtoxaiH4oe5eeGoh4eemu7mee8ietaecu6Zoodoo0hoP5uquaish2ahc7nooshi0Aidae2Zee4pheeZee3taerae6Aepu2Ayaith2iivohp8Wuikohvae2Peange6zeihep8eC9mee8johshaech1Ubohd4Ko5caequaezaigohyai1TheeN6Gohva6jinguev4oox2eet5auv0aiyeo7eJieGheebaeMahshifaeDohy8quut4ueFei3eiCheimoechoo2EegiveeDah1sohs7ezee3oaWa2iiv2Chi1haiS5ahph4phu5su0hiocee3ooyaeghang7sho7maiXeo5aex"))
155      (compression lzma)
157Be careful to put a set of parentheses around each configuration entry. White space isn't significant, so feel free to indent things and wrap them over lines if you want.
159Keep copies of this file safe - you'll need it to do extractions! Print a copy out and lock it in your fire safe! Ok, currently, you might be able to recreate it if you remember where you put the storage, but when I add the `(encryption ...)` option, there will be an encryption key to deal with as well.
161## Your first backup
163Think of a tag to identify the filesystem you're backing up. If it's `/home` on the server `gandalf`, you might call it `gandalf-home`. If it's the entire filesystem of the server `bilbo`, you might just call it `bilbo`.
165Then from your shell, run (as root):
167      # ugarit snapshot <ugarit.conf> [-c] [-a] <tag> <path to root of filesystem>
169For example, if we have a `ugarit.conf` in the current directory:
171      # ugarit snapshot ugarit.conf -c localhost-etc /etc
173Specify the `-c` flag if you want to store ctimes in the archive; since it's impossible to restore ctimes when extracting from an archive, doing this is useful only for informational purposes, so it's not done by default. Similarly, atimes aren't stored in the archive unless you specify `-a`, because otherwise, there will be a lot of directory blocks uploaded on every snapshot, as the atime of every file will have been changed by the previous snapshot - so with `-a` specified, on every snapshot, every directory in your filesystem will be uploaded! Ugarit will happily restore atimes if they are found in an archive; their storage is made optional simply because uploading them is costly and rarely useful.
175## Exploring the archive
177Now you have a backup, you can explore the contents of the archive. This need not be done as root, as long as you can read `ugarit.conf`; however, if you want to extract files, run it as root.
179      $ ugarit explore <ugarit.conf>
181This will put you into an interactive shell exploring a virtual filesystem. The root directory contains an entry for every tag; if you type `ls` you should see your tag listed, and within that tag, you'll find a list of snapshots, in descending date order, with a special entry `current` for the most recent snapshot. Within a snapshot, you'll find the root directory of your snapshot, and will be able to `cd` into subdirectories, and so on:
183      > ls
184      Test <tag>
185      > cd Test
186      /Test> ls
187      2009-01-24 10:28:16 <snapshot>
188      2009-01-24 10:28:16 <snapshot>
189      current <snapshot>
190      /Test> cd current
191      /Test/current> ls   
192      README.txt <file>
193      LICENCE.txt <symlink>
194      subdir <dir>
195      .svn <dir>
196      FIFO <fifo>
197      chardev <character-device>
198      blockdev <block-device>
199      /Test/current> ls -ll LICENCE.txt
200      lrwxr-xr-x 1000 100 2009-01-15 03:02:49 LICENCE.txt -> subdir/LICENCE.txt
201      target: subdir/LICENCE.txt
202      ctime: 1231988569.0
204As well as exploring around, you can also extract files or directories (or entire snapshots) by using the `get` command. Ugarit will do its best to restore the metadata of files, subject to the rights of the user you run it as.
206Type `help` to get help in the interactive shell.
208## Duplicating tags
210As mentioned above, you can duplicate a tag, creating two tags that refer to the same snapshot and its history but that can then have their own subsequent history of snapshots applied to each independently, with the following command:
212      $ ugarit fork <ugarit.conf> <existing tag> <new tag>
214## `.ugarit` files
216By default, Ugarit will archive everything it finds in the filesystem tree you tell it to snapshot. However, this might not always be desired; so we provide the facility to override this with `.ugarit` files, or global rules in your `.conf` file.
218Note: The syntax of these files is provisional, as I want to experiment with usability, as the current syntax is ugly. So please don't be surprised if the format changes in incompatible ways in subsequent versions!
220In quick summary, if you want to ignore all files or directories matching a glob in the current directory and below, put the following in a `.ugarit` file in that directory:
222      (* (glob "*~") exclude)
224You can write quite complex expressions as well as just globs. The full set of rules is:
226* `(glob "`*pattern*`")` matches files and directories whose names match the glob pattern
227* `(name "`*name*`")` matches files and directories with exactly that name (useful for files called `*`...)
228* `(modified-within ` *number* ` seconds)` matches files and directories modified within the given number of seconds
229* `(modified-within ` *number* ` minutes)` matches files and directories modified within the given number of minutes
230* `(modified-within ` *number* ` hours)` matches files and directories modified within the given number of hours
231* `(modified-within ` *number* ` days)` matches files and directories modified within the given number of days
232* `(not ` *rule*`)` matches files and directories that do not match the given rule
233* `(and ` *rule* *rule...*`)` matches files and directories that match all the given rules
234* `(or ` *rule* *rule...*`)` matches files and directories that match any of the given rules
236Also, you can override a previous exclusion with an explicit include in a lower-level directory:
238    (* (glob "*~") include)
240Also, you can bind rules to specific directories, rather than to "this directory and all beneath it", by specifying an absolute or relative path instead of the `*`:
242    ("/etc" (name "passwd") exclude)
244If you use a relative path, it's taken relative to the directory of the `.ugarit` file.
246You can also put some rules in your `.conf` file, although relative paths are illegal there, by adding lines of this form to the file:
248    (rule * (glob "*~") exclude)
250# Questions and Answers
252## What happens if a snapshot is interrupted?
254Nothing! Whatever blocks have been uploaded will be uploaded, but the snapshot is only added to the tag once the entire filesystem has been snapshotted. So just start the snapshot again. Any files that have already be uploaded will then not need to be uploaded again, so the second snapshot should proceed quickly to the point where it failed before, and continue from there.
256Unless the archive ends up with a partially-uploaded corrupted block due to being interrupted during upload, you'll be fine. The filesystem backend has been written to avoid this by writing the block to a file with the wrong name, then renaming it to the correct name when it's entirely uploaded.
258## Should I share a single large archive between all my filesystems?
260I think so. Using a single large archive means that blocks shared between servers - eg, software installed from packages and that sort of thing - will only ever need to be uploaded once, saving storage space and upload bandwidth.
262# Future Directions
264Here's a list of planned developments, in approximate priority order:
266## Backends
268* Eradicate all GPL taint from gdbm by using sqlite for storing
269  metadata in backends!
271* Remove backend-log. Have just backend-fs, backend-splitlog, and
272  maybe a backend-sqlite for everything-in-sqlite storage (plus future
273  S3/SFTP backends). Not including meta-backends such as backend-cache
274  and backend-replicated.
276* Support for recreating the index and tags on a backend-log or
277  backend-splitlog if they get corrupted, from the headers left in the
278  log. Do this by extending the backend protocol with a special
279  "admin" command that allows for arbitrary backend-specific
280  operations, and write an ugarit-backend-admin CLI tool to administer
281  backends with it.
283* Support for unlinking in backend-splitlog, by marking byte ranges as
284  unused in the metadata (and by touching the headers in the log so we
285  maintain the invariant that the metadata is a reconstructible cache)
286  and removing the entries for the unlinked blocks, perhaps provide an
287  option to attempt to re-use existing holes to put blocks in for
288  online reuse, and provide an offline compaction operation.
290* Support for SFTP as a storage backend. Store one file per block, as
291  per `backend-fs`, but remotely. See
292 for sftp
293  protocol specs; popen an `ssh -p sftp` connection to the server then
294  talk that simple binary protocol. Tada!
296* Support for S3 as a storage backend. There is now an S3 egg!
298* Support for replicated archives. This will involve a special storage
299  backend that can wrap any number of other archives, each tagged with
300  a trust percentage and read and write load weightings. Each block
301  will be uploaded to enough archives to make the total trust be at
302  least 100%, by randomly picking the archives weighted by their write
303  load weighting. A local cache will be kept of which backends carry
304  which blocks, and reads will be serviced by picking the archive that
305  carries it and has the highest read load weighting. If that archive
306  is unavailable or has lost the block, then they will be trued in
307  read load order; and if none of them have it, an exhaustive search
308  of all available archives will be performed before giving up, and
309  the cache updated with the results if the block is found. Users will
310  be recommended to delete the cache if an archive is lost, so it gets
311  recreated in usage, as otherwise the system may assume blocks are
312  present when they are not, and thus fail to upload them when
313  snapshotting. The individual physical archives that we put
314  replication on top of won't be "valid" archives unless they are 100%
315  replicated, as they'll contain references to blocks that are on
316  other archives. It might be a good idea to mark them as such with a
317  special tag to avoid people trying to restore directly from them. A
318  copy of the replication configuration could be stored under a
319  special tag to mark this fact, and to enable easy finding of the
320  proper replicated archive to work from.
322## Core
324* Eradicate all GPL taint from gdbm by using sqlite for storing
325  the mtime cache!
327* Better error handling. Right now we give up if we can't read a file
328  or directory. It would be awesomer to print a warning but continue
329  to archive everything else.
331* More `.ugarit` actions. Right now we just have exclude and include;
332  we might specify less-safe operations such as commands to run before
333  and after snapshotting certain subtrees, or filters (don't send this
334  SVN repository; instead send the output of `svnadmin dump`),
335  etc. Running arbitrary commands is a security risk if random users
336  write their own `.ugarit` files - so we'd need some trust-based
337  mechanism; they'd need to be explicitly enabled in `ugarit.conf`,
338  then a `.ugarit` option could disable all unsafe operations in a
339  subtree.
341* Support for FFS flags, Mac OS X extended filesystem attributes, NTFS
342  ACLs/streams, FAT attributes, etc... Ben says to look at Box Backup
343  for some code to do that sort of thing.
345* Implement lock-tag! etc. in backend-fs, as a precaution against two
346  concurrent snapshots racing over updating the tag, where concurrent
347  access to the archive is even possible.
349* Deletion support - letting you remove snapshots. Perhaps you might
350  want to remove all snapshots older than a given number of days on a
351  given tag. Or just remove X out of Y snapshots older than a given
352  number of days on a given tag. We have the core support for this;
353  just find a snapshot and `unlink-directory!` it, leaving a dangling
354  pointer from the snapshot, and write the snapshot handling code to
355  expect this. Again, check Box Backup for that.
357* Some kind of accounting for storage usage by snapshot. It'd be nice
358  to track, as we write a snapshot to the archive, how many bytes we
359  reuse and how many we back up. We can then store this in the
360  snapshot metadata, and so report them somewhere. The blocks uploaded
361  by a snapshot may well then be reused by other snapshots later on,
362  so it wouldn't be a true measure of 'unique storage', nor a measure
363  of what you'd reclaim by deleting that snapshot, but it'd be
364  interesting anyway.
366* Option, when backing up, to not cross mountpoints
368* Option, when backing up, to store inode number and mountpoint path
369  in directory entries, and then when extracting, keeping a dictionary
370  of this unique identifier to pathname, so that if a file to be
371  extracted is already in the dictionary and the hash is the same, a
372  hardlink can be created.
374* Archival mode as well as snapshot mode. Whereas a snapshot record
375  takes a filesystem tree and adds it to a chain of snapshots of the
376  same filesystem tree, archival mode takes a filesystem tree and
377  inserts it into a search tree anchored on the specified tag,
378  indexing it on a list of key+value properties supplied at archival
379  time. An archive tag is represented in the virtual filesystem as a
380  directory full of archive objects, each identified by their full
381  hash; each archive object references the filesystem root as well as
382  the key+value properties, and optionally a parent link like a
383  snapshot, as an archive can be made that explicitly replaces an
384  earlier one and should replace it in the index; there is also a
385  virtual directory for each indexed property which contains a
386  directory for each value of the property, full of symlinks to the
387  archive objects, and subdirectories that allow multi-property
388  searches on other properties. The index itself is stored as a B-Tree
389  with a reasonably small block size; when it's updated, the modified
390  index blocks are replaced, thereby gaining new hashes, so their
391  parents need replacing, all the way up the tree until a new root
392  block is created. The existing block unlink mechanism in the
393  backends will reclaim storage for blocks that are superceded, if the
394  backend supports it. When this is done, ugarit will offer the option
395  of snapshotting to a snapshot tag, or archiving to an archive tag,
396  or archiving to an archive tag while replacing a specified archive
397  object (nominated by path within the tag), which causes it to be
398  removed from the index (except from the directory listing all
399  archives by hash), and the new archive object is inserted,
400  referencing the old one as a parent.
402* Dump/restore format. On a dump, walk an arbitrary subtree of an
403  archive, serialising objects. Do not put any hashes in the dump
404  format - dump out entire files, and just identify objects with
405  sequential numbers when forming the directory / snapshot trees. On a
406  restore, read the same format and slide it into an archive (creating
407  any required top-level snapshot objects if the dump doesn't start
408  from a snapshot) and putting it onto a specified tag. The
409  intension is that this format can be used to migrate your stuff
410  between archives, perhaps to change to a better backend.
412## Front-end
414* Better error messages
416* FUSE support. Mount it as a read-only filesystem :-D Then consider
417  adding Fossil-style writing to the `current` of a snapshot, with
418  copy-on-write of blocks to a buffer area on the local disk, then the
419  option to make a snapshot of `current`.
421* Filesystem watching. Even with the hash-caching trick, a snapshot
422  will still involve walking the entire directory tree and looking up
423  every file in the hash cash. We can do better than that - some
424  platforms provide an interface for receiving real-time notifications
425  of changed or added files. Using this, we could allow ugarit to run
426  in continuous mode, keeping a log of file notifications from the OS
427  while it does an initial full snapshot. It can then wait for a
428  specified period (one hour, perhaps?), accumulating names of files
429  changed since it started, before then creating a new snapshot by
430  uploading just the files it knows to have changed, while subsequent
431  file change notifications go to a new list.
433## Testing
435* An option to verify a snapshot, walking every block in it checking
436  there's no dangling references, and that everything matches its
437  hash, without needing to put it into a filesystem, and applying any
438  other sanity checks we can think of en route. Optionally compare it
439  to an on-disk filesystem, while we're at it.
441* A more formal test corpus with a unit test script around the
442  `ugarit` command-line tool; the corpus should contain a mix of tiny
443  and huge files and directories, awkward cases for sharing of blocks
444  (many identical files in the same dir, etc), complex forms of file
445  metadata, and so on. It should archive and restore the corpus
446  several times over with each hash, compression, and encryption
447  option.
449# Acknowledgements
451The original idea came from Venti, a content-addressed storage system
452from Plan 9. Venti is usable directly by user applications, and is
453also integrated with the Fossil filesystem to support snapshotting the
454status of a Fossil filesystem. Fossil allows references to either be
455to a block number on the Fossil partition or to a Venti key; so when a
456filesystem has been snapshotted, all it now contains is a "root
457directory" pointer into the Venti archive, and any files modified
458therafter are copied-on-write into Fossil where they may be modified
459until the next snapshot.
461We're nowhere near that exciting yet, but using FUSE, we might be able
462to do something similar, which might be fun. However, Venti inspired
463me when I read about it years ago; it showed me how elegant
464content-addressed storage is. Finding out that the Git version control
465system used the same basic tricks really just confirmed this for me.
467Also, I'd like to tip my hat to Duplicity. With the changing economics
468of storage presented by services like Amazon S3 and, I
469looked to Duplicity as it provided both SFTP and S3 backends. However,
470it worked in terms of full and incremental backups, a model that I
471think made sense for magnetic tapes, but loses out to
472content-addressed snapshots when you have random-access
473media. Duplicity inspired me by its adoption of multiple backends, the
474very backends I want to use, but I still hungered for a
475content-addressed snapshot store.
477I'd also like to tip my hat to Box Backup. I've only used it a little,
478because it requires a special server to manage the storage (and I want
479to get my backups *off* of my servers), but it also inspires me with
480directions I'd like to take Ugarit. It's much more aware of real-time
481access to random-access storage than Duplicity, and has a very
482interesting continuous background incremental backup mode, moving away
483from the tape-based paradigm of backups as something you do on a
484special day of the week, like some kind of religious observance. I
485hope the author Ben, who is a good friend of mine, won't mind me
486plundering his source code for details on how to request real-time
487notification of changes from the filesystem, and how to read and write
488extended attributes!
490Moving on from the world of backup, I'd like to thank the Chicken Team
491for producing Chicken Scheme. Felix, Peter, Elf, and Alex have
492particularly inspired me with their can-do attitudes to combining
493programming-language elegance and pragmatic engineering - two things
494many would think un-unitable enemies. Of course, they didn't do it all
495themselves - R5RS Scheme and the SRFIs provided a solid foundation to
496build on, and there's a cast of many more in the Chicken community,
497working on other bits of Chicken or just egging everyone on. And I
498can't not thank Henry Baker for writing the seminal paper on the
499technique Chicken uses to implement full tail-calling Scheme with
500cheap continuations on top of C; Henry already had my admiration for
501his work on combining elegance and pragmatism in linear logic. Why
502doesn't he return my calls? I even sent flowers.
504A special thanks should go to Christian Kellermann for porting Ugarit
505to use Chicken 4 modules, too, which was otherwise a big bottleneck to
506development, as I was stuck on Chicken 3 for some time!
508Thanks to the early adopters who brought me useful feedback, too!
510And I'd like to thank my wife for putting up with me spending several
511evenings and weekends and holiday days working on this thing...
513# Version history
515* 0.8: decoupling backends from the core and into separate binaries,
516  accessed via standard input and output, so they can be run over SSH
517  tunnels and other such magic.
519* 0.7: file cache support, sorting of directories so they're archived
520  in canonical order, autoloading of hash/encryption/compression
521  modules so they're not required dependencies any more.
523* 0.6: .ugarit support.
525* 0.5: Keyed hashing so attackers can't tell what blocks you have,
526  markers in logs so the index can be reconstructed, sha2 support, and
527  passphrase support.
529* 0.4: AES encryption.
531* 0.3: Added splitlog backend, and fixed a .meta file typo.
533* 0.2: Initial public release.
535* 0.1: Internal development release.
Note: See TracBrowser for help on using the repository browser.