source: project/release/4/ugarit/trunk/README.txt @ 20322

Last change on this file since 20322 was 20322, checked in by Alaric Snell-Pym, 11 years ago

ugarit: README updates, plus reporting on file cache performance.

File size: 36.2 KB
Line 
1# Introduction
2
3Ugarit is a backup/archival system based around content-addressible storage.
4
5This allows it to upload incremental backups to a remote server or a local filesystem such as an NFS share or a removable hard disk, yet have the archive instantly able to produce a full snapshot on demand rather than needing to download a full snapshot plus all the incrementals since. The content-addressible storage technique means that the incrementals can be applied to a snapshot on various kinds of storage without needing intelligence in the storage itself - so the snapshots can live within Amazon S3 or on a removable hard disk.
6
7Also, the same storage can be shared between multiple systems that all back up to it - and the incremental upload algorithm will mean that any files shared between the servers will only need to be uploaded once. If you back up a complete server, than go and back up another that is running the same distribution, then all the files in `/bin` and so on that are already in the storage will not need to be backed up again; the system will automatically spot that they're already there, and not upload them again.
8
9## So what's that mean in practice?
10
11You can run Ugarit to back up any number of filesystems to a shared archive, and on every backup, Ugarit will only upload files or parts of files that aren't already in the archive - be they from the previous snapshot, earlier snapshots, snapshot of entirely unrelated filesystems, etc. Every time you do a snapshot, Ugarit builds an entire complete directory tree of the snapshot in the archive - but reusing any parts of files, files, or entire directories that already exist anywhere in the archive, and only uploading what doesn't already exist.
12
13The support for parts of files means that, in many cases, gigantic files like database tables and virtual disks for virtual machines will not need to be uploaded entirely every time they change, as the changed sections will be identified and uploaded.
14
15Because a complete directory tree exists in the archive for any snapshot, the extraction algorithm is incredibly simple - and, therefore, incredibly reliable and fast. Simple, reliable, and fast are just what you need when you're trying to reconstruct the filesystem of a live server.
16
17Also, it means that you can do lots of small snapshots. If you run a snapshot every hour, then only a megabyte or two might have changed in your filesystem, so you only upload a megabyte or two - yet you end up with a complete history of your filesystem at hourly intervals in the archive.
18
19Conventional backup systems usually either store a full backup then incrementals to their archives, meaning that doing a restore involves reading the full backup then reading every incremental since and applying them - so to do a restore, you have to download *every version* of the filesystem you've ever uploaded, or you have to do periodic full backups (even though most of your filesystem won't have changed since the last full backup) to reduce the number of incrementals required for a restore. Better results are had from systems that use a special backup server to look after the archive storage, which accept incremental backups and apply them to the snapshot they keep in order to maintain a most-recent snapshot that can be downloaded in a single run; but they then restrict you to using dedicated servers as your archive stores, ruling out cheap scalable solutions like Amazon S3, or just backing up to a removable USB or eSATA disk you attach to your system whenever you do a backup. And dedicated backup servers are complex pieces of software; can you rely on something complex for the fundamental foundation of your data security system?
20
21## System Requirements
22
23Ugarit should run on any POSIX-compliant system that can run [Chicken Scheme](http://www.call-with-current-continuation.org/). It stores and restores all the file attributes reported by the `stat` system call - POSIX mode permissions, UID, GID, mtime, and optionally atime and ctime (although the ctime cannot be restored due to POSIX restrictions). Ugarit will store files, directories, device and character special files, symlinks, and FIFOs.
24
25Support for extended filesystem attributes - ACLs, alternative streams, forks and other metadata - is possible, due to the extensible directory entry format; support for such metadata will be added as required.
26
27Currently, only local filesystem-based archive storage backends are complete: these are suitable for backing up to a removable hard disk or a filesystem shared via NFS or other protocols. They can also be used to snapshot to local disks, although this is obviously then vulnerable to local system failures; if the computer that's being backed up catches fire, you won't be able to restore it from archives that were also ruined!
28
29However, the next backend to be implemented will be one for Amazon S3, and an SFTP backend for storing archives anywhere you can ssh to. Other backends will be implemented on demand; an archive can, in principle, be stored on anything that can store files by name, report on whether a file already exists, and efficiently download a file by name. This rules out magnetic tapes due to their requirement for sequential access.
30
31Although we need to trust that a backend won't lose data (for now), we don't need to trust the backend not to snoop on us, as Ugarit optionally encrypts everything sent to the archive.
32
33## What's in an archive?
34
35An Ugarit archive contains a load of blocks, each up to a maximum size (usually 1MiB, although other backends might impose smaller limits). Each block is identified by the Tiger hash of its contents; this is how Ugarit avoids ever uploading the same data twice, by checking to see if the data to be uploaded already exists in the archive by looking up the hash. The contents of the blocks are compressed and then encrypted before upload.
36
37Every file uploaded is, unless it's small enough to fit in a single block, chopped into blocks, and each block uploaded. This way, the entire contents of your filesystem can be uploaded - or, at least, only the parts of it that aren't already there! The blocks are then tied together to create a snapshot by upload blocks full of the Tiger hashes of the data blocks, and directory blocks uploaded listing the names and attributes of files in directories, along with the hashes of the blocks that contain the files' contents. Even the blocks that contain lists of hashes of other blocks are subject to checking for pre-existence in the archive; if only a few MiB of your hundred-GiB filesystem has changed, then even the index blocks and directory blocks are re-used from previous snapshots.
38
39Once uploaded, a block in the archive is never again changed. After all, if its contents changed, its hash would change, so it would no longer be the same block! However, every block has a reference count, tracking the number of index blocks that refer to it. This means that the archive knows which blocks are shared between multiple snapshots (or shared *within* a snapshot - if a filesystem has more than one copy of the same file, still only one copy is uploaded), so that if a given snapshot is deleted, then the blocks that only that snapshot is using can be deleted to free up space, without corrupting other snapshots by deleting blocks they share. Bear in mind, however, that not all storage backends may support this - there are certain advantages to being an append-only archive. For a start, you can't delete something by accident! The supplied filesystem backend supports deletion, while the logfile backend does not. However, the actual deletion command hasn't been implemented yet either, so it's a moot point for now...
40
41Finally, the archive contains objects called tags. Unlike the blocks, the tags contents can change, and they have meaningful names rather than being identified by hash. Tags identify the top-level blocks of snapshots within the system, from which (by following the chain of hashes down through the index blocks) the entire contents of a snapshot may be found. Unless you happen to have recorded the hash of a snapshot somewhere, the tags are where you find snapshots from when you want to do a restore!
42
43Whenever a snapshot is taken, as soon as Ugarit has uploaded all the files, directories, and index blocks required, it looks up the tag you have identified as the target of the snapshot. If the tag already exists, then the snapshot it currently points to is recorded in the new snapshot as the "previous snapshot"; then the snapshot header containing the previous snapshot hash, along with the date and time and any comments you provide for the snapshot, and is uploaded (as another block, identified by its hash). The tag is then updated to point to the new snapshot.
44
45This way, each tag actually identifies a chronological chain of snapshots. Normally, you would use a tag to identify a filesystem being archived; you'd keep snapshotting the filesystem to the same tag, resulting in all the snapshots of that filesystem hanging from the tag. But if you wanted to remember any particular snapshot (perhaps if it's the snapshot you take before a big upgrade or other risky operation), you can duplicate the tag, in effect 'forking' the chain of snapshots much like a branch in a version control system.
46
47# Using Ugarit
48
49## Installation
50
51Install [Chicken Scheme](http://www.call-with-current-continuation.org/) using their [installation instructions](http://chicken.wiki.br/Getting%20started#Installing%20Chicken).
52
53Ugarit can then be installed by typing (as root):
54
55    chicken-install ugarit
56
57See the [chicken-install manual](http://wiki.call-cc.org/manual/Extensions#chicken-install-reference) for details if you have any trouble, or wish to install into your home directory.
58
59## Setting up an archive
60
61Firstly, you need to know the archive identifier for the place you'll be storing your archives. This depends on your backend.
62
63### Filesystem backend
64
65The filesystem backend creates archives by storing each block or tag in its own file, in a directory. To keep the objects-per-directory count down, it'll split the files into subdirectories.
66
67To set up a new filesystem-backend archive, just create an empty directory that Ugarit will have write access to when it runs. It will probably run as root in order to be able to access the contents of files that aren't world-readable (although that's up to you), so be careful of NFS mounts that have `maproot=nobody` set!
68
69You can then refer to it using the following archive identifier:
70
71      fs "...path to directory..."
72
73### New Logfile backend
74
75The logfile backend works much like the original Venti system. It's append-only - you won't be able to delete old snapshots from a logfile archive, even when I implement deletion. It stores the archive in two sets of files; one is a log of data blocks, split at a specified maximum size, and the other is the metadata: a GDBM file used as an index to locate blocks in the logfiles and to store the blocks' types, a GDBM file of tags, and a counter file used in naming logfiles.
76
77To set up a new logfile archive, just choose where to put the two sets of files. It would be nice to put the metadata on a different physical disk to the logs, to reduce seeking. Create a directory for each, or if you only have one disk, you can put them all in the same directory.
78
79You can then refer to it using the following archive identifier:
80
81      splitlog "...log directory..." "...metadata directory..." max-logfile-size
82
83For most platforms, a max-logfile-size of 900000000 (900 MB) should suffice. For now, don't go much bigger than that on 32-bit systems until Chicken's `file-position` function is fixed to work with files >1GB in size.
84
85### Old Logfile backend
86
87The old logfile backend works much like the original Venti system. It's append-only - you won't be able to delete old snapshots from a logfile archive, even when I implement deletion. It stores the archive in three files; one is a log of data blocks, one is a GDBM index that remembers where in the log each block resides, one is a GDBM of tags.
88
89This worked well, but exposed a bug in Chicken when dealing with files more than about a gigabyte on 32-bit platforms. I fixed that in short order, but it reminded me that some platforms don't like files larger than 2GB anyway, so I wrote a new logfile backend that splits the log file into chunks at a specified point. You probably want to use the new backend - the old backend is kept for compatability only.
90
91To set up an old logfile archive, just choose where to put the three files. It would be nice to put the index and tags on a different physical disk to the log, to reduce seeking.
92
93You can then refer to it using the following archive identifier:
94
95      log "...logfile..." "...indexfile..." "...tagsfile..."
96
97Neither of the files need to exist in advance; Ugarit will create them.
98
99## Writing a ugarit.conf
100
101`ugarit.conf` should look something like this:
102
103      (storage <archive identifier>)
104      (hash tiger "<A secret string>")
105      [(compression [deflate|lzma])]
106      [(encryption aes <key>)]
107      [(file-cache "<path>")]
108      [(rule ...)]
109
110The hash line chooses a hash algorithm. Currently Tiger-192 (`tiger`), SHA-256 (`sha256`), SHA-384 (`sha384`) and SHA-512 (`sha512`) are supported; if you omit the line then Tiger will still be used, but it will be a simple hash of the block with the block type appended, which reveals to attackers what blocks you have (as the hash is of the unencrypted block, and the hash is not encrypted). This is useful for development and testing or for use with trusted archives, but not advised for use with archives that attackers may snoop at. Providing a secret string produces a hash function that hashes the block, the type of block, and the secret string, producing hashes that attackers who can snoop the archive cannot use to find known blocks. Whichever hash function you use, you will need to install the required Chicken egg with one of the following commands:
111
112    sudo chicken-install tiger-hash  # for tiger
113    sudo chicken-install sha2        # for the SHA hashes
114
115`lzma` is the recommended compression option for low-bandwidth backends or when space is tight, but it's very slow to compress; deflate or no compression at all are better for fast local archives. To have no compression at all, just remove the `(compression ...)` line entirely. Likewise, to use compression, you need to install a Chicken egg:
116
117       sudo chicken-install z3       # for deflate
118       sudo chicken-install lzma     # for lzma
119
120Likewise, the `(encryption ...)` line may be omitted to have no encryption; the only currently supported algorithm is aes (in CBC mode) with a key given in hex, as a passphrase (hashed to get a key), or a passphrase read from the terminal on every run. The key may be 16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a hex key, just supply it as a string, like so:
121
122      (encryption aes "00112233445566778899AABBCCDDEEFF")
123     
124...for 128-bit AES,
125
126      (encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")
127
128...for 192-bit AES, or
129
130      (encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
131     
132...for 256-bit AES.
133
134Alternatively, you can provide a passphrase, and specify how large a key you want it turned into, like so:
135
136      (encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
137
138Finally, the extra-paranoid can request that Ugarit prompt for a passphrase on every run and hash it into a key of the specified length, like so:
139
140      (encryption aes ([16|24|32] prompt))
141
142(note the lack of quotes around `prompt`, distinguishing it from a passphrase)
143
144Again, as it is an optional feature, to use encryption, you must install the appropriate Chicken egg:
145
146       sudo chicken-install aes
147
148A file cache, if enabled, significantly speeds up subsequent snapshots of a filesystem tree. The file cache is a file (which Ugarit will create if it doesn't already exist) mapping filenames to (mtime,hash) pairs; as it scans the filesystem, if it files a file in the cache and the mtime has not changed, it will assume it is already archived under the specified hash. This saves it from having to read the entire file to hash it and then check if the hash is present in the archive. In other words, if only a few files have changed since the last snapshot, then snapshotting a directory tree becomes an O(N) operation, where N is the number of files, rather than an O(M) operation, where M is the total size of files involved.
149
150For example:
151
152      (storage splitlog "/net/spiderman/archive/logs" "/net/spiderman/archive/index" 900000000)
153      (hash tiger "Giung0ahKahsh9ahphu5EiGhAhth4eeyDahs2aiWAlohr6raYeequ8uiUr3Oojoh")
154      (encryption aes (32 "deing2Aechediequohdo6Thuvu0OLoh6fohngio9koush9euX6el9iesh6Aef4augh3WiY7phahmesh2Theeziniem5hushai5zigushohnah1quae1ooXo0eingu1Aifeo1eeSheaz9ieSie9tieneibeiPho0quu6um8weiyagh4kaeshooThooNgeyoul2Ahsahgh8imohw3hoyazai9gaph5ohhaechiedeenusaeghahghipe8ii3oo9choh5cieth5iev3jiedohquai4Thiedah5sah5kohcepheixai3aiPainozooc6zohNeiy6Jeigeesie5eithoo0ciiNae8Nee3eiSuKaiza0VaiPai2eeFooNgeengaif9yaiv9rathuoQuohy0ohth6OiL9aisaetheeWoh9aiQu0yoo6aequ3quoiChi7joonohwuvaipeuh2eiPoogh1Ie8tiequesoshaeBue5ieca8eerah0quieJoNoh3Jiesh1chei8weidixeen1yah1ioChie0xaimahWeeriex5eetiichahP9iey5ux7ahGhei7eejahxooch5eiqu0Pheir9Reiri4ahqueijuchae8eeyieMeixa4ciisioloe9oaroof1eegh4idaeNg5aepeip8mah7ixaiSohtoxaiH4oe5eeGoh4eemu7mee8ietaecu6Zoodoo0hoP5uquaish2ahc7nooshi0Aidae2Zee4pheeZee3taerae6Aepu2Ayaith2iivohp8Wuikohvae2Peange6zeihep8eC9mee8johshaech1Ubohd4Ko5caequaezaigohyai1TheeN6Gohva6jinguev4oox2eet5auv0aiyeo7eJieGheebaeMahshifaeDohy8quut4ueFei3eiCheimoechoo2EegiveeDah1sohs7ezee3oaWa2iiv2Chi1haiS5ahph4phu5su0hiocee3ooyaeghang7sho7maiXeo5aex"))
155      (compression lzma)
156
157Be careful to put a set of parentheses around each configuration entry. White space isn't significant, so feel free to indent things and wrap them over lines if you want.
158
159Keep copies of this file safe - you'll need it to do extractions! Print a copy out and lock it in your fire safe! Ok, currently, you might be able to recreate it if you remember where you put the storage, but when I add the `(encryption ...)` option, there will be an encryption key to deal with as well.
160
161## Your first backup
162
163Think of a tag to identify the filesystem you're backing up. If it's `/home` on the server `gandalf`, you might call it `gandalf-home`. If it's the entire filesystem of the server `bilbo`, you might just call it `bilbo`.
164
165Then from your shell, run (as root):
166
167      # ugarit snapshot <ugarit.conf> [-c] [-a] <tag> <path to root of filesystem>
168
169For example, if we have a `ugarit.conf` in the current directory:
170
171      # ugarit snapshot ugarit.conf -c localhost-etc /etc
172
173Specify the `-c` flag if you want to store ctimes in the archive; since it's impossible to restore ctimes when extracting from an archive, doing this is useful only for informational purposes, so it's not done by default. Similarly, atimes aren't stored in the archive unless you specify `-a`, because otherwise, there will be a lot of directory blocks uploaded on every snapshot, as the atime of every file will have been changed by the previous snapshot - so with `-a` specified, on every snapshot, every directory in your filesystem will be uploaded! Ugarit will happily restore atimes if they are found in an archive; their storage is made optional simply because uploading them is costly and rarely useful.
174
175## Exploring the archive
176
177Now you have a backup, you can explore the contents of the archive. This need not be done as root, as long as you can read `ugarit.conf`; however, if you want to extract files, run it as root.
178
179      $ ugarit explore <ugarit.conf>
180
181This will put you into an interactive shell exploring a virtual filesystem. The root directory contains an entry for every tag; if you type `ls` you should see your tag listed, and within that tag, you'll find a list of snapshots, in descending date order, with a special entry `current` for the most recent snapshot. Within a snapshot, you'll find the root directory of your snapshot, and will be able to `cd` into subdirectories, and so on:
182
183      > ls
184      Test <tag>
185      > cd Test
186      /Test> ls
187      2009-01-24 10:28:16 <snapshot>
188      2009-01-24 10:28:16 <snapshot>
189      current <snapshot>
190      /Test> cd current
191      /Test/current> ls   
192      README.txt <file>
193      LICENCE.txt <symlink>
194      subdir <dir>
195      .svn <dir>
196      FIFO <fifo>
197      chardev <character-device>
198      blockdev <block-device>
199      /Test/current> ls -ll LICENCE.txt
200      lrwxr-xr-x 1000 100 2009-01-15 03:02:49 LICENCE.txt -> subdir/LICENCE.txt
201      target: subdir/LICENCE.txt
202      ctime: 1231988569.0
203
204As well as exploring around, you can also extract files or directories (or entire snapshots) by using the `get` command. Ugarit will do its best to restore the metadata of files, subject to the rights of the user you run it as.
205
206Type `help` to get help in the interactive shell.
207
208## Duplicating tags
209
210As mentioned above, you can duplicate a tag, creating two tags that refer to the same snapshot and its history but that can then have their own subsequent history of snapshots applied to each independently, with the following command:
211
212      $ ugarit fork <ugarit.conf> <existing tag> <new tag>
213
214## `.ugarit` files
215
216By default, Ugarit will archive everything it finds in the filesystem tree you tell it to snapshot. However, this might not always be desired; so we provide the facility to override this with `.ugarit` files, or global rules in your `.conf` file.
217
218Note: The syntax of these files is provisional, as I want to experiment with usability, as the current syntax is ugly. So please don't be surprised if the format changes in incompatible ways in subsequent versions!
219
220In quick summary, if you want to ignore all files or directories matching a glob in the current directory and below, put the following in a `.ugarit` file in that directory:
221
222      (* (glob "*~") exclude)
223
224You can write quite complex expressions as well as just globs. The full set of rules is:
225
226* `(glob "`*pattern*`")` matches files and directories whose names match the glob pattern
227* `(name "`*name*`")` matches files and directories with exactly that name (useful for files called `*`...)
228* `(modified-within ` *number* ` seconds)` matches files and directories modified within the given number of seconds
229* `(modified-within ` *number* ` minutes)` matches files and directories modified within the given number of minutes
230* `(modified-within ` *number* ` hours)` matches files and directories modified within the given number of hours
231* `(modified-within ` *number* ` days)` matches files and directories modified within the given number of days
232* `(not ` *rule*`)` matches files and directories that do not match the given rule
233* `(and ` *rule* *rule...*`)` matches files and directories that match all the given rules
234* `(or ` *rule* *rule...*`)` matches files and directories that match any of the given rules
235
236Also, you can override a previous exclusion with an explicit include in a lower-level directory:
237
238    (* (glob "*~") include)
239
240Also, you can bind rules to specific directories, rather than to "this directory and all beneath it", by specifying an absolute or relative path instead of the `*`:
241
242    ("/etc" (name "passwd") exclude)
243
244If you use a relative path, it's taken relative to the directory of the `.ugarit` file.
245
246You can also put some rules in your `.conf` file, although relative paths are illegal there, by adding lines of this form to the file:
247
248    (rule * (glob "*~") exclude)
249
250# Questions and Answers
251
252## What happens if a snapshot is interrupted?
253
254Nothing! Whatever blocks have been uploaded will be uploaded, but the snapshot is only added to the tag once the entire filesystem has been snapshotted. So just start the snapshot again. Any files that have already be uploaded will then not need to be uploaded again, so the second snapshot should proceed quickly to the point where it failed before, and continue from there.
255
256Unless the archive ends up with a partially-uploaded corrupted block due to being interrupted during upload, you'll be fine. The filesystem backend has been written to avoid this by writing the block to a file with the wrong name, then renaming it to the correct name when it's entirely uploaded.
257
258## Should I share a single large archive between all my filesystems?
259
260I think so. Using a single large archive means that blocks shared between servers - eg, software installed from packages and that sort of thing - will only ever need to be uploaded once, saving storage space and upload bandwidth.
261
262# Future Directions
263
264Here's a list of planned developments, in approximate priority order:
265
266## Backends
267
268* Support for remote backends. This will involve splitting the
269  backends into separate executables, and having the frontend talk to
270  them via a simple protocol over standard input and output. Then it
271  will be possible to use ssh to talk to backends on remote machines,
272  as well as various other interesting integration opportunities.
273
274* Support for SFTP as a storage backend. Store one file per block, as
275  per `backend-fs`, but remotely. See
276  http://tools.ietf.org/html/draft-ietf-secsh-filexfer-13 for sftp
277  protocol specs; popen an `ssh -p sftp` connection to the server then
278  talk that simple binary protocol. Tada!
279
280* Support for S3 as a storage backend. What's the best way to get at
281  the S3 API? Write our own client, or find a C library to wrap?
282
283* Support for recreating the index and tags on a backend-log or
284  backend-splitlog if they get corrupted, from the headers left in the
285  log.
286
287* Support for replicated archives. This will involve a special storage
288  backend that can wrap any number of other archives, each tagged with
289  a trust percentage and read and write load weightings. Each block
290  will be uploaded to enough archives to make the total trust be at
291  least 100%, by randomly picking the archives weighted by their write
292  load weighting. A local cache will be kept of which backends carry
293  which blocks, and reads will be serviced by picking the archive that
294  carries it and has the highest read load weighting. If that archive
295  is unavailable or has lost the block, then they will be trued in
296  read load order; and if none of them have it, an exhaustive search
297  of all available archives will be performed before giving up, and
298  the cache updated with the results if the block is found. Users will
299  be recommended to delete the cache if an archive is lost, so it gets
300  recreated in usage, as otherwise the system may assume blocks are
301  present when they are not, and thus fail to upload them when
302  snapshotting.
303
304## Core
305
306* Better error handling. Right now we give up if we can't read a file
307  or directory. It would be awesomer to print a warning but continue
308  to archive everything else.
309
310* More `.ugarit` actions. Right now we just have exclude and include;
311  we might specify less-safe operations such as commands to run before
312  and after snapshotting certain subtrees, or filters (don't send this
313  SVN repository; instead send the output of `svnadmin dump`),
314  etc. Running arbitrary commands is a security risk if random users
315  write their own `.ugarit` files - so we'd need some trust-based
316  mechanism; they'd need to be explicitly enabled in `ugarit.conf`,
317  then a `.ugarit` option could disable all unsafe operations in a
318  subtree.
319
320* Support for FFS flags, Mac OS X extended filesystem attributes, NTFS
321  ACLs/streams, FAT attributes, etc... Ben says to look at Box Backup
322  for some code to do that sort of thing.
323
324* Implement lock-tag! etc. in backend-fs, as a precaution against two
325  concurrent snapshots racing over updating the tag, where concurrent
326  access to the archive is even possible.
327
328* Deletion support - letting you remove snapshots. Perhaps you might
329  want to remove all snapshots older than a given number of days on a
330  given tag. Or just remove X out of Y snapshots older than a given
331  number of days on a given tag. We have the core support for this;
332  just find a snapshot and `unlink-directory!` it, leaving a dangling
333  pointer from the snapshot, and write the snapshot handling code to
334  expect this. Again, check Box Backup for that.
335
336* Some kind of accounting for storage usage by snapshot. It'd be nice
337  to track, as we write a snapshot to the archive, how many bytes we
338  reuse and how many we back up. We can then store this in the
339  snapshot metadata, and so report them somewhere. The blocks uploaded
340  by a snapshot may well then be reused by other snapshots later on,
341  so it wouldn't be a true measure of 'unique storage', nor a measure
342  of what you'd reclaim by deleting that snapshot, but it'd be
343  interesting anyway.
344
345* Option, when backing up, to not cross mountpoints
346
347* Option, when backing up, to store inode number and mountpoint path
348  in directory entries, and then when extracting, keeping a dictionary
349  of this unique identifier to pathname, so that if a file to be
350  extracted is already in the dictionary and the hash is the same, a
351  hardlink can be created.
352
353* Archival mode as well as snapshot mode. Whereas a snapshot record
354  takes a filesystem tree and adds it to a chain of snapshots of the
355  same filesystem tree, archival mode takes a filesystem tree and
356  inserts it into a search tree anchored on the specified tag,
357  indexing it on a list of key+value properties supplied at archival
358  time. An archive tag is represented in the virtual filesystem as a
359  directory full of archive objects, each identified by their full
360  hash; each archive object references the filesystem root as well as
361  the key+value properties, and optionally a parent link like a
362  snapshot, as an archive can be made that explicitly replaces an
363  earlier one and should replace it in the index; there is also a
364  virtual directory for each indexed property which contains a
365  directory for each value of the property, full of symlinks to the
366  archive objects, and subdirectories that allow multi-property
367  searches on other properties. The index itself is stored as a B-Tree
368  with a reasonably small block size; when it's updated, the modified
369  index blocks are replaced, thereby gaining new hashes, so their
370  parents need replacing, all the way up the tree until a new root
371  block is created. The existing block unlink mechanism in the
372  backends will reclaim storage for blocks that are superceded, if the
373  backend supports it. When this is done, ugarit will offer the option
374  of snapshotting to a snapshot tag, or archiving to an archive tag,
375  or archiving to an archive tag while replacing a specified archive
376  object (nominated by path within the tag), which causes it to be
377  removed from the index (except from the directory listing all
378  archives by hash), and the new archive object is inserted,
379  referencing the old one as a parent.
380
381## Front-end
382
383* Better error messages
384
385* Archive transfer: a command to open two archives. From the source
386  one, it lists all tags, then for each tag, walks the history, and
387  for each snapshot, copies it to the destination archive. For
388  migrating archives to a new backend.
389
390* FUSE support. Mount it as a read-only filesystem :-D Then consider
391  adding Fossil-style writing to the `current` of a snapshot, with
392  copy-on-write of blocks to a buffer area on the local disk, then the
393  option to make a snapshot of `current`.
394
395* More explicit support for archival usage: really, a different kind
396  of tag. Rather than having a chain of snapshots of the same
397  filesystem, the tag would have some kind of database of snapshots,
398  with more emphasis on metadata and searchability.
399
400* Filesystem watching. Even with the hash-caching trick, a snapshot
401  will still involve walking the entire directory tree and looking up
402  every file in the hash cash. We can do better than that - some
403  platforms provide an interface for receiving real-time notifications
404  of changed or added files. Using this, we could allow ugarit to run
405  in continuous mode, keeping a log of file notifications from the OS
406  while it does an initial full snapshot. It can then wait for a
407  specified period (one hour, perhaps?), accumulating names of files
408  changed since it started, before then creating a new snapshot by
409  uploading just the files it knows to have changed, while subsequent
410  file change notifications go to a new list.
411
412## Testing
413
414* An option to verify a snapshot, walking every block in it checking
415  there's no dangling references, and that everything matches its
416  hash, without needing to put it into a filesystem, and applying any
417  other sanity checks we can think of en route. Optionally compare it
418  to an on-disk filesystem, while we're at it.
419
420* A more formal test corpus with a unit test script around the
421  `ugarit` command-line tool; the corpus should contain a mix of tiny
422  and huge files and directories, awkward cases for sharing of blocks
423  (many identical files in the same dir, etc), complex forms of file
424  metadata, and so on. It should archive and restore the corpus
425  several times over with each hash, compression, and encryption
426  option.
427
428# Acknowledgements
429
430The original idea came from Venti, a content-addressed storage system
431from Plan 9. Venti is usable directly by user applications, and is
432also integrated with the Fossil filesystem to support snapshotting the
433status of a Fossil filesystem. Fossil allows references to either be
434to a block number on the Fossil partition or to a Venti key; so when a
435filesystem has been snapshotted, all it now contains is a "root
436directory" pointer into the Venti archive, and any files modified
437therafter are copied-on-write into Fossil where they may be modified
438until the next snapshot.
439
440We're nowhere near that exciting yet, but using FUSE, we might be able
441to do something similar, which might be fun. However, Venti inspired
442me when I read about it years ago; it showed me how elegant
443content-addressed storage is. Finding out that the Git version control
444system used the same basic tricks really just confirmed this for me.
445
446Also, I'd like to tip my hat to Duplicity. With the changing economics
447of storage presented by services like Amazon S3 and rsync.net, I
448looked to Duplicity as it provided both SFTP and S3 backends. However,
449it worked in terms of full and incremental backups, a model that I
450think made sense for magnetic tapes, but loses out to
451content-addressed snapshots when you have random-access
452media. Duplicity inspired me by its adoption of multiple backends, the
453very backends I want to use, but I still hungered for a
454content-addressed snapshot store.
455
456I'd also like to tip my hat to Box Backup. I've only used it a little,
457because it requires a special server to manage the storage (and I want
458to get my backups *off* of my servers), but it also inspires me with
459directions I'd like to take Ugarit. It's much more aware of real-time
460access to random-access storage than Duplicity, and has a very
461interesting continuous background incremental backup mode, moving away
462from the tape-based paradigm of backups as something you do on a
463special day of the week, like some kind of religious observance. I
464hope the author Ben, who is a good friend of mine, won't mind me
465plundering his source code for details on how to request real-time
466notification of changes from the filesystem, and how to read and write
467extended attributes!
468
469Moving on from the world of backup, I'd like to thank the Chicken Team
470for producing Chicken Scheme. Felix, Peter, Elf, and Alex have
471particularly inspired me with their can-do attitudes to combining
472programming-language elegance and pragmatic engineering - two things
473many would think un-unitable enemies. Of course, they didn't do it all
474themselves - R5RS Scheme and the SRFIs provided a solid foundation to
475build on, and there's a cast of many more in the Chicken community,
476working on other bits of Chicken or just egging everyone on. And I
477can't not thank Henry Baker for writing the seminal paper on the
478technique Chicken uses to implement full tail-calling Scheme with
479cheap continuations on top of C; Henry already had my admiration for
480his work on combining elegance and pragmatism in linear logic. Why
481doesn't he return my calls? I even sent flowers.
482
483Thanks to the early adopters who brought me useful feedback, too!
484
485And I'd like to thank my wife for putting up with me spending several
486evenings working on this thing...
487
488# Version history
489
490* 0.6: .ugarit support
491
492* 0.5: Keyed hashing so attackers can't tell what blocks you have,
493  markers in logs so the index can be reconstructed, sha2 support, and
494  passphrase support.
495
496* 0.4: AES encryption
497
498* 0.3: Added splitlog backend, and fixed a .meta file typo
499
500* 0.2: Initial public release
501
502* 0.1: Internal development release
Note: See TracBrowser for help on using the repository browser.