source: project/release/4/ugarit/trunk/README.txt @ 25479

Last change on this file since 25479 was 25479, checked in by Alaric Snell-Pym, 9 years ago

ugarit: Dotting is, crossing ts...

File size: 37.4 KB
Line 
1# Introduction
2
3Ugarit is a backup/archival system based around content-addressible storage.
4
5This allows it to upload incremental backups to a remote server or a local filesystem such as an NFS share or a removable hard disk, yet have the archive instantly able to produce a full snapshot on demand rather than needing to download a full snapshot plus all the incrementals since. The content-addressible storage technique means that the incrementals can be applied to a snapshot on various kinds of storage without needing intelligence in the storage itself - so the snapshots can live within Amazon S3 or on a removable hard disk.
6
7Also, the same storage can be shared between multiple systems that all back up to it - and the incremental upload algorithm will mean that any files shared between the servers will only need to be uploaded once. If you back up a complete server, than go and back up another that is running the same distribution, then all the files in `/bin` and so on that are already in the storage will not need to be backed up again; the system will automatically spot that they're already there, and not upload them again.
8
9## So what's that mean in practice?
10
11You can run Ugarit to back up any number of filesystems to a shared archive, and on every backup, Ugarit will only upload files or parts of files that aren't already in the archive - be they from the previous snapshot, earlier snapshots, snapshot of entirely unrelated filesystems, etc. Every time you do a snapshot, Ugarit builds an entire complete directory tree of the snapshot in the archive - but reusing any parts of files, files, or entire directories that already exist anywhere in the archive, and only uploading what doesn't already exist.
12
13The support for parts of files means that, in many cases, gigantic files like database tables and virtual disks for virtual machines will not need to be uploaded entirely every time they change, as the changed sections will be identified and uploaded.
14
15Because a complete directory tree exists in the archive for any snapshot, the extraction algorithm is incredibly simple - and, therefore, incredibly reliable and fast. Simple, reliable, and fast are just what you need when you're trying to reconstruct the filesystem of a live server.
16
17Also, it means that you can do lots of small snapshots. If you run a snapshot every hour, then only a megabyte or two might have changed in your filesystem, so you only upload a megabyte or two - yet you end up with a complete history of your filesystem at hourly intervals in the archive.
18
19Conventional backup systems usually either store a full backup then incrementals to their archives, meaning that doing a restore involves reading the full backup then reading every incremental since and applying them - so to do a restore, you have to download *every version* of the filesystem you've ever uploaded, or you have to do periodic full backups (even though most of your filesystem won't have changed since the last full backup) to reduce the number of incrementals required for a restore. Better results are had from systems that use a special backup server to look after the archive storage, which accept incremental backups and apply them to the snapshot they keep in order to maintain a most-recent snapshot that can be downloaded in a single run; but they then restrict you to using dedicated servers as your archive stores, ruling out cheap scalable solutions like Amazon S3, or just backing up to a removable USB or eSATA disk you attach to your system whenever you do a backup. And dedicated backup servers are complex pieces of software; can you rely on something complex for the fundamental foundation of your data security system?
20
21## System Requirements
22
23Ugarit should run on any POSIX-compliant system that can run [Chicken Scheme](http://www.call-with-current-continuation.org/). It stores and restores all the file attributes reported by the `stat` system call - POSIX mode permissions, UID, GID, mtime, and optionally atime and ctime (although the ctime cannot be restored due to POSIX restrictions). Ugarit will store files, directories, device and character special files, symlinks, and FIFOs.
24
25Support for extended filesystem attributes - ACLs, alternative streams, forks and other metadata - is possible, due to the extensible directory entry format; support for such metadata will be added as required.
26
27Currently, only local filesystem-based archive storage backends are complete: these are suitable for backing up to a removable hard disk or a filesystem shared via NFS or other protocols. They can also be used to snapshot to local disks, although this is obviously then vulnerable to local system failures; if the computer that's being backed up catches fire, you won't be able to restore it from archives that were also ruined!
28
29However, the next backend to be implemented will be one for Amazon S3, and an SFTP backend for storing archives anywhere you can ssh to. Other backends will be implemented on demand; an archive can, in principle, be stored on anything that can store files by name, report on whether a file already exists, and efficiently download a file by name. This rules out magnetic tapes due to their requirement for sequential access.
30
31Although we need to trust that a backend won't lose data (for now), we don't need to trust the backend not to snoop on us, as Ugarit optionally encrypts everything sent to the archive.
32
33## What's in an archive?
34
35An Ugarit archive contains a load of blocks, each up to a maximum size (usually 1MiB, although other backends might impose smaller limits). Each block is identified by the Tiger hash of its contents; this is how Ugarit avoids ever uploading the same data twice, by checking to see if the data to be uploaded already exists in the archive by looking up the hash. The contents of the blocks are compressed and then encrypted before upload.
36
37Every file uploaded is, unless it's small enough to fit in a single block, chopped into blocks, and each block uploaded. This way, the entire contents of your filesystem can be uploaded - or, at least, only the parts of it that aren't already there! The blocks are then tied together to create a snapshot by upload blocks full of the Tiger hashes of the data blocks, and directory blocks uploaded listing the names and attributes of files in directories, along with the hashes of the blocks that contain the files' contents. Even the blocks that contain lists of hashes of other blocks are subject to checking for pre-existence in the archive; if only a few MiB of your hundred-GiB filesystem has changed, then even the index blocks and directory blocks are re-used from previous snapshots.
38
39Once uploaded, a block in the archive is never again changed. After all, if its contents changed, its hash would change, so it would no longer be the same block! However, every block has a reference count, tracking the number of index blocks that refer to it. This means that the archive knows which blocks are shared between multiple snapshots (or shared *within* a snapshot - if a filesystem has more than one copy of the same file, still only one copy is uploaded), so that if a given snapshot is deleted, then the blocks that only that snapshot is using can be deleted to free up space, without corrupting other snapshots by deleting blocks they share. Bear in mind, however, that not all storage backends may support this - there are certain advantages to being an append-only archive. For a start, you can't delete something by accident! The supplied filesystem backend supports deletion, while the logfile backend does not. However, the actual deletion command hasn't been implemented yet either, so it's a moot point for now...
40
41Finally, the archive contains objects called tags. Unlike the blocks, the tags contents can change, and they have meaningful names rather than being identified by hash. Tags identify the top-level blocks of snapshots within the system, from which (by following the chain of hashes down through the index blocks) the entire contents of a snapshot may be found. Unless you happen to have recorded the hash of a snapshot somewhere, the tags are where you find snapshots from when you want to do a restore!
42
43Whenever a snapshot is taken, as soon as Ugarit has uploaded all the files, directories, and index blocks required, it looks up the tag you have identified as the target of the snapshot. If the tag already exists, then the snapshot it currently points to is recorded in the new snapshot as the "previous snapshot"; then the snapshot header containing the previous snapshot hash, along with the date and time and any comments you provide for the snapshot, and is uploaded (as another block, identified by its hash). The tag is then updated to point to the new snapshot.
44
45This way, each tag actually identifies a chronological chain of snapshots. Normally, you would use a tag to identify a filesystem being archived; you'd keep snapshotting the filesystem to the same tag, resulting in all the snapshots of that filesystem hanging from the tag. But if you wanted to remember any particular snapshot (perhaps if it's the snapshot you take before a big upgrade or other risky operation), you can duplicate the tag, in effect 'forking' the chain of snapshots much like a branch in a version control system.
46
47# Using Ugarit
48
49## Installation
50
51Install [Chicken Scheme](http://www.call-with-current-continuation.org/) using their [installation instructions](http://chicken.wiki.br/Getting%20started#Installing%20Chicken).
52
53Ugarit can then be installed by typing (as root):
54
55    chicken-install ugarit
56
57See the [chicken-install manual](http://wiki.call-cc.org/manual/Extensions#chicken-install-reference) for details if you have any trouble, or wish to install into your home directory.
58
59## Setting up an archive
60
61Firstly, you need to know the archive identifier for the place you'll be storing your archives. This depends on your backend.
62
63### Filesystem backend
64
65The filesystem backend creates archives by storing each block or tag in its own file, in a directory. To keep the objects-per-directory count down, it'll split the files into subdirectories.
66
67To set up a new filesystem-backend archive, just create an empty directory that Ugarit will have write access to when it runs. It will probably run as root in order to be able to access the contents of files that aren't world-readable (although that's up to you), so be careful of NFS mounts that have `maproot=nobody` set!
68
69You can then refer to it using the following archive identifier:
70
71      "backend-fs fs ...path to directory..."
72
73### Logfile backend
74
75The logfile backend works much like the original Venti system. It's
76append-only - you won't be able to delete old snapshots from a logfile
77archive, even when I implement deletion. It stores the archive in two
78sets of files; one is a log of data blocks, split at a specified
79maximum size, and the other is the metadata: an sqlite database used
80to track the location of blocks in the log files, the contents of
81tags, and a count of the logs so a filename can be chosen for a new one.
82
83To set up a new logfile archive, just choose where to put the two
84parts. It would be nice to put the metadata file on a different
85physical disk to the logs directory, to reduce seeking. If you only
86have one disk, you can put the metadata file in the log directory
87("metadata" is a good name).
88
89You can then refer to it using the following archive identifier:
90
91      "backend-fs splitlog ...log directory... ...metadata file... max-logfile-size"
92
93For most platforms, a max-logfile-size of 900000000 (900 MB) should suffice. For now, don't go much bigger than that on 32-bit systems until Chicken's `file-position` function is fixed to work with files >1GB in size.
94
95## Writing a ugarit.conf
96
97`ugarit.conf` should look something like this:
98
99      (storage <archive identifier>)
100      (hash tiger "<A secret string>")
101      [(compression [deflate|lzma])]
102      [(encryption aes <key>)]
103      [(file-cache "<path>")]
104      [(rule ...)]
105
106The hash line chooses a hash algorithm. Currently Tiger-192 (`tiger`), SHA-256 (`sha256`), SHA-384 (`sha384`) and SHA-512 (`sha512`) are supported; if you omit the line then Tiger will still be used, but it will be a simple hash of the block with the block type appended, which reveals to attackers what blocks you have (as the hash is of the unencrypted block, and the hash is not encrypted). This is useful for development and testing or for use with trusted archives, but not advised for use with archives that attackers may snoop at. Providing a secret string produces a hash function that hashes the block, the type of block, and the secret string, producing hashes that attackers who can snoop the archive cannot use to find known blocks. Whichever hash function you use, you will need to install the required Chicken egg with one of the following commands:
107
108    chicken-install -s tiger-hash  # for tiger
109    chicken-install -s sha2        # for the SHA hashes
110
111`lzma` is the recommended compression option for low-bandwidth backends or when space is tight, but it's very slow to compress; deflate or no compression at all are better for fast local archives. To have no compression at all, just remove the `(compression ...)` line entirely. Likewise, to use compression, you need to install a Chicken egg:
112
113       chicken-install -s z3       # for deflate
114       chicken-install -s lzma     # for lzma
115
116Likewise, the `(encryption ...)` line may be omitted to have no encryption; the only currently supported algorithm is aes (in CBC mode) with a key given in hex, as a passphrase (hashed to get a key), or a passphrase read from the terminal on every run. The key may be 16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a hex key, just supply it as a string, like so:
117
118      (encryption aes "00112233445566778899AABBCCDDEEFF")
119
120...for 128-bit AES,
121
122      (encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")
123
124...for 192-bit AES, or
125
126      (encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
127     
128...for 256-bit AES.
129
130Alternatively, you can provide a passphrase, and specify how large a key you want it turned into, like so:
131
132      (encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
133
134Finally, the extra-paranoid can request that Ugarit prompt for a passphrase on every run and hash it into a key of the specified length, like so:
135
136      (encryption aes ([16|24|32] prompt))
137
138(note the lack of quotes around `prompt`, distinguishing it from a passphrase)
139
140Again, as it is an optional feature, to use encryption, you must install the appropriate Chicken egg:
141
142       chicken-install -s aes
143
144A file cache, if enabled, significantly speeds up subsequent snapshots
145of a filesystem tree. The file cache is a file (which Ugarit will
146create if it doesn't already exist) mapping filenames to
147(mtime,hash,size) tuples; as it scans the filesystem, if it files a
148file in the cache and the mtime and size have not changed, it will
149assume it is already archived under the specified hash. This saves it
150from having to read the entire file to hash it and then check if the
151hash is present in the archive. In other words, if only a few files
152have changed since the last snapshot, then snapshotting a directory
153tree becomes an O(N) operation, where N is the number of files, rather
154than an O(M) operation, where M is the total size of files involved.
155
156For example:
157
158      (storage splitlog "/net/spiderman/archive/logs" "/net/spiderman/archive/index" 900000000)
159      (hash tiger "Giung0ahKahsh9ahphu5EiGhAhth4eeyDahs2aiWAlohr6raYeequ8uiUr3Oojoh")
160      (encryption aes (32 "deing2Aechediequohdo6Thuvu0OLoh6fohngio9koush9euX6el9iesh6Aef4augh3WiY7phahmesh2Theeziniem5hushai5zigushohnah1quae1ooXo0eingu1Aifeo1eeSheaz9ieSie9tieneibeiPho0quu6um8weiyagh4kaeshooThooNgeyoul2Ahsahgh8imohw3hoyazai9gaph5ohhaechiedeenusaeghahghipe8ii3oo9choh5cieth5iev3jiedohquai4Thiedah5sah5kohcepheixai3aiPainozooc6zohNeiy6Jeigeesie5eithoo0ciiNae8Nee3eiSuKaiza0VaiPai2eeFooNgeengaif9yaiv9rathuoQuohy0ohth6OiL9aisaetheeWoh9aiQu0yoo6aequ3quoiChi7joonohwuvaipeuh2eiPoogh1Ie8tiequesoshaeBue5ieca8eerah0quieJoNoh3Jiesh1chei8weidixeen1yah1ioChie0xaimahWeeriex5eetiichahP9iey5ux7ahGhei7eejahxooch5eiqu0Pheir9Reiri4ahqueijuchae8eeyieMeixa4ciisioloe9oaroof1eegh4idaeNg5aepeip8mah7ixaiSohtoxaiH4oe5eeGoh4eemu7mee8ietaecu6Zoodoo0hoP5uquaish2ahc7nooshi0Aidae2Zee4pheeZee3taerae6Aepu2Ayaith2iivohp8Wuikohvae2Peange6zeihep8eC9mee8johshaech1Ubohd4Ko5caequaezaigohyai1TheeN6Gohva6jinguev4oox2eet5auv0aiyeo7eJieGheebaeMahshifaeDohy8quut4ueFei3eiCheimoechoo2EegiveeDah1sohs7ezee3oaWa2iiv2Chi1haiS5ahph4phu5su0hiocee3ooyaeghang7sho7maiXeo5aex"))
161      (compression lzma)
162
163Be careful to put a set of parentheses around each configuration entry. White space isn't significant, so feel free to indent things and wrap them over lines if you want.
164
165Keep copies of this file safe - you'll need it to do extractions!
166Print a copy out and lock it in your fire safe! Ok, currently, you
167might be able to recreate it if you remember where you put the
168storage, but encryption keys are harder to remember.
169
170## Your first backup
171
172Think of a tag to identify the filesystem you're backing up. If it's `/home` on the server `gandalf`, you might call it `gandalf-home`. If it's the entire filesystem of the server `bilbo`, you might just call it `bilbo`.
173
174Then from your shell, run (as root):
175
176      # ugarit snapshot <ugarit.conf> [-c] [-a] <tag> <path to root of filesystem>
177
178For example, if we have a `ugarit.conf` in the current directory:
179
180      # ugarit snapshot ugarit.conf -c localhost-etc /etc
181
182Specify the `-c` flag if you want to store ctimes in the archive; since it's impossible to restore ctimes when extracting from an archive, doing this is useful only for informational purposes, so it's not done by default. Similarly, atimes aren't stored in the archive unless you specify `-a`, because otherwise, there will be a lot of directory blocks uploaded on every snapshot, as the atime of every file will have been changed by the previous snapshot - so with `-a` specified, on every snapshot, every directory in your filesystem will be uploaded! Ugarit will happily restore atimes if they are found in an archive; their storage is made optional simply because uploading them is costly and rarely useful.
183
184## Exploring the archive
185
186Now you have a backup, you can explore the contents of the archive. This need not be done as root, as long as you can read `ugarit.conf`; however, if you want to extract files, run it as root.
187
188      $ ugarit explore <ugarit.conf>
189
190This will put you into an interactive shell exploring a virtual filesystem. The root directory contains an entry for every tag; if you type `ls` you should see your tag listed, and within that tag, you'll find a list of snapshots, in descending date order, with a special entry `current` for the most recent snapshot. Within a snapshot, you'll find the root directory of your snapshot, and will be able to `cd` into subdirectories, and so on:
191
192      > ls
193      Test <tag>
194      > cd Test
195      /Test> ls
196      2009-01-24 10:28:16 <snapshot>
197      2009-01-24 10:28:16 <snapshot>
198      current <snapshot>
199      /Test> cd current
200      /Test/current> ls   
201      README.txt <file>
202      LICENCE.txt <symlink>
203      subdir <dir>
204      .svn <dir>
205      FIFO <fifo>
206      chardev <character-device>
207      blockdev <block-device>
208      /Test/current> ls -ll LICENCE.txt
209      lrwxr-xr-x 1000 100 2009-01-15 03:02:49 LICENCE.txt -> subdir/LICENCE.txt
210      target: subdir/LICENCE.txt
211      ctime: 1231988569.0
212
213As well as exploring around, you can also extract files or directories (or entire snapshots) by using the `get` command. Ugarit will do its best to restore the metadata of files, subject to the rights of the user you run it as.
214
215Type `help` to get help in the interactive shell.
216
217## Duplicating tags
218
219As mentioned above, you can duplicate a tag, creating two tags that refer to the same snapshot and its history but that can then have their own subsequent history of snapshots applied to each independently, with the following command:
220
221      $ ugarit fork <ugarit.conf> <existing tag> <new tag>
222
223## `.ugarit` files
224
225By default, Ugarit will archive everything it finds in the filesystem tree you tell it to snapshot. However, this might not always be desired; so we provide the facility to override this with `.ugarit` files, or global rules in your `.conf` file.
226
227Note: The syntax of these files is provisional, as I want to experiment with usability, as the current syntax is ugly. So please don't be surprised if the format changes in incompatible ways in subsequent versions!
228
229In quick summary, if you want to ignore all files or directories matching a glob in the current directory and below, put the following in a `.ugarit` file in that directory:
230
231      (* (glob "*~") exclude)
232
233You can write quite complex expressions as well as just globs. The full set of rules is:
234
235* `(glob "`*pattern*`")` matches files and directories whose names match the glob pattern
236* `(name "`*name*`")` matches files and directories with exactly that name (useful for files called `*`...)
237* `(modified-within ` *number* ` seconds)` matches files and directories modified within the given number of seconds
238* `(modified-within ` *number* ` minutes)` matches files and directories modified within the given number of minutes
239* `(modified-within ` *number* ` hours)` matches files and directories modified within the given number of hours
240* `(modified-within ` *number* ` days)` matches files and directories modified within the given number of days
241* `(not ` *rule*`)` matches files and directories that do not match the given rule
242* `(and ` *rule* *rule...*`)` matches files and directories that match all the given rules
243* `(or ` *rule* *rule...*`)` matches files and directories that match any of the given rules
244
245Also, you can override a previous exclusion with an explicit include in a lower-level directory:
246
247    (* (glob "*~") include)
248
249Also, you can bind rules to specific directories, rather than to "this directory and all beneath it", by specifying an absolute or relative path instead of the `*`:
250
251    ("/etc" (name "passwd") exclude)
252
253If you use a relative path, it's taken relative to the directory of the `.ugarit` file.
254
255You can also put some rules in your `.conf` file, although relative paths are illegal there, by adding lines of this form to the file:
256
257    (rule * (glob "*~") exclude)
258
259# Questions and Answers
260
261## What happens if a snapshot is interrupted?
262
263Nothing! Whatever blocks have been uploaded will be uploaded, but the snapshot is only added to the tag once the entire filesystem has been snapshotted. So just start the snapshot again. Any files that have already be uploaded will then not need to be uploaded again, so the second snapshot should proceed quickly to the point where it failed before, and continue from there.
264
265Unless the archive ends up with a partially-uploaded corrupted block due to being interrupted during upload, you'll be fine. The filesystem backend has been written to avoid this by writing the block to a file with the wrong name, then renaming it to the correct name when it's entirely uploaded.
266
267## Should I share a single large archive between all my filesystems?
268
269I think so. Using a single large archive means that blocks shared between servers - eg, software installed from packages and that sort of thing - will only ever need to be uploaded once, saving storage space and upload bandwidth.
270
271# Future Directions
272
273Here's a list of planned developments, in approximate priority order:
274
275## General
276
277* Everywhere I use (sql ...) to create an sqlite prepared statement,
278  don't. Create them all up-front and reuse the resulting statement
279  objects, it'll save memory and time.
280
281* Migrate the source repo to Fossil (when there's a
282  kitten-technologies.co.uk migration to Fossil), and update the egg
283  locations thingy.
284
285## Backends
286
287* Support for recreating the index and tags on a backend-splitlog if
288  they get corrupted, from the headers left in the log. Do this by
289  extending the backend protocol with a special "admin" command that
290  allows for arbitrary backend-specific operations, and write an
291  ugarit-backend-admin CLI tool to administer backends with it.
292
293* Support for unlinking in backend-splitlog, by marking byte ranges as
294  unused in the metadata (and by touching the headers in the log so we
295  maintain the invariant that the metadata is a reconstructible cache)
296  and removing the entries for the unlinked blocks, perhaps provide an
297  option to attempt to re-use existing holes to put blocks in for
298  online reuse, and provide an offline compaction operation.
299
300* Support for SFTP as a storage backend. Store one file per block, as
301  per `backend-fs`, but remotely. See
302  http://tools.ietf.org/html/draft-ietf-secsh-filexfer-13 for sftp
303  protocol specs; popen an `ssh -p sftp` connection to the server then
304  talk that simple binary protocol. Tada!
305
306* Support for S3 as a storage backend. There is now an S3 egg!
307
308* Support for replicated archives. This will involve a special storage
309  backend that can wrap any number of other archives, each tagged with
310  a trust percentage and read and write load weightings. Each block
311  will be uploaded to enough archives to make the total trust be at
312  least 100%, by randomly picking the archives weighted by their write
313  load weighting. A local cache will be kept of which backends carry
314  which blocks, and reads will be serviced by picking the archive that
315  carries it and has the highest read load weighting. If that archive
316  is unavailable or has lost the block, then they will be trued in
317  read load order; and if none of them have it, an exhaustive search
318  of all available archives will be performed before giving up, and
319  the cache updated with the results if the block is found. Users will
320  be recommended to delete the cache if an archive is lost, so it gets
321  recreated in usage, as otherwise the system may assume blocks are
322  present when they are not, and thus fail to upload them when
323  snapshotting. The individual physical archives that we put
324  replication on top of won't be "valid" archives unless they are 100%
325  replicated, as they'll contain references to blocks that are on
326  other archives. It might be a good idea to mark them as such with a
327  special tag to avoid people trying to restore directly from them. A
328  copy of the replication configuration could be stored under a
329  special tag to mark this fact, and to enable easy finding of the
330  proper replicated archive to work from.
331
332## Core
333
334* API documentation for the units we export
335
336* More `.ugarit` actions. Right now we just have exclude and include;
337  we might specify less-safe operations such as commands to run before
338  and after snapshotting certain subtrees, or filters (don't send this
339  SVN repository; instead send the output of `svnadmin dump`),
340  etc. Running arbitrary commands is a security risk if random users
341  write their own `.ugarit` files - so we'd need some trust-based
342  mechanism; they'd need to be explicitly enabled in `ugarit.conf`,
343  then a `.ugarit` option could disable all unsafe operations in a
344  subtree.
345
346* Support for FFS flags, Mac OS X extended filesystem attributes, NTFS
347  ACLs/streams, FAT attributes, etc... Ben says to look at Box Backup
348  for some code to do that sort of thing.
349
350* Implement lock-tag! etc. in backend-fs, as a precaution against two
351  concurrent snapshots racing over updating the tag, where concurrent
352  access to the archive is even possible.
353
354* Deletion support - letting you remove snapshots. Perhaps you might
355  want to remove all snapshots older than a given number of days on a
356  given tag. Or just remove X out of Y snapshots older than a given
357  number of days on a given tag. We have the core support for this;
358  just find a snapshot and `unlink-directory!` it, leaving a dangling
359  pointer from the snapshot, and write the snapshot handling code to
360  expect this. Again, check Box Backup for that.
361
362* Some kind of accounting for storage usage by snapshot. It'd be nice
363  to track, as we write a snapshot to the archive, how many bytes we
364  reuse and how many we back up. We can then store this in the
365  snapshot metadata, and so report them somewhere. The blocks uploaded
366  by a snapshot may well then be reused by other snapshots later on,
367  so it wouldn't be a true measure of 'unique storage', nor a measure
368  of what you'd reclaim by deleting that snapshot, but it'd be
369  interesting anyway.
370
371* Option, when backing up, to not cross mountpoints
372
373* Option, when backing up, to store inode number and mountpoint path
374  in directory entries, and then when extracting, keeping a dictionary
375  of this unique identifier to pathname, so that if a file to be
376  extracted is already in the dictionary and the hash is the same, a
377  hardlink can be created.
378
379* Archival mode as well as snapshot mode. Whereas a snapshot record
380  takes a filesystem tree and adds it to a chain of snapshots of the
381  same filesystem tree, archival mode takes a filesystem tree and
382  inserts it into a search tree anchored on the specified tag,
383  indexing it on a list of key+value properties supplied at archival
384  time. An archive tag is represented in the virtual filesystem as a
385  directory full of archive objects, each identified by their full
386  hash; each archive object references the filesystem root as well as
387  the key+value properties, and optionally a parent link like a
388  snapshot, as an archive can be made that explicitly replaces an
389  earlier one and should replace it in the index; there is also a
390  virtual directory for each indexed property which contains a
391  directory for each value of the property, full of symlinks to the
392  archive objects, and subdirectories that allow multi-property
393  searches on other properties. The index itself is stored as a B-Tree
394  with a reasonably small block size; when it's updated, the modified
395  index blocks are replaced, thereby gaining new hashes, so their
396  parents need replacing, all the way up the tree until a new root
397  block is created. The existing block unlink mechanism in the
398  backends will reclaim storage for blocks that are superceded, if the
399  backend supports it. When this is done, ugarit will offer the option
400  of snapshotting to a snapshot tag, or archiving to an archive tag,
401  or archiving to an archive tag while replacing a specified archive
402  object (nominated by path within the tag), which causes it to be
403  removed from the index (except from the directory listing all
404  archives by hash), and the new archive object is inserted,
405  referencing the old one as a parent.
406
407* Dump/restore format. On a dump, walk an arbitrary subtree of an
408  archive, serialising objects. Do not put any hashes in the dump
409  format - dump out entire files, and just identify objects with
410  sequential numbers when forming the directory / snapshot trees. On a
411  restore, read the same format and slide it into an archive (creating
412  any required top-level snapshot objects if the dump doesn't start
413  from a snapshot) and putting it onto a specified tag. The
414  intension is that this format can be used to migrate your stuff
415  between archives, perhaps to change to a better backend.
416
417## Front-end
418
419* Better error messages
420
421* Line editing in the "explore" CLI, ideally with tab completion
422
423* API mode: Works something like the backend API, except at the
424  archive level. Supports all the important archive operations, plus
425  access to sexpr stream writers and key stream writers,
426  archive-node-fold, etc. Requested by andyjpb, perhaps I can write
427  the framework for this and then let him add API functions as he desires.
428
429* FUSE support. Mount it as a read-only filesystem :-D Then consider
430  adding Fossil-style writing to the `current` of a snapshot, with
431  copy-on-write of blocks to a buffer area on the local disk, then the
432  option to make a snapshot of `current`.
433
434* Filesystem watching. Even with the hash-caching trick, a snapshot
435  will still involve walking the entire directory tree and looking up
436  every file in the hash cache. We can do better than that - some
437  platforms provide an interface for receiving real-time notifications
438  of changed or added files. Using this, we could allow ugarit to run
439  in continuous mode, keeping a log of file notifications from the OS
440  while it does an initial full snapshot. It can then wait for a
441  specified period (one hour, perhaps?), accumulating names of files
442  changed since it started, before then creating a new snapshot by
443  uploading just the files it knows to have changed, while subsequent
444  file change notifications go to a new list.
445
446## Testing
447
448* An option to verify a snapshot, walking every block in it checking
449  there's no dangling references, and that everything matches its
450  hash, without needing to put it into a filesystem, and applying any
451  other sanity checks we can think of en route. Optionally compare it
452  to an on-disk filesystem, while we're at it.
453
454* A unit test script around the `ugarit` command-line tool; the corpus
455  should contain a mix of tiny and huge files and directories, awkward
456  cases for sharing of blocks (many identical files in the same dir,
457  etc), complex forms of file metadata, and so on. It should archive
458  and restore the corpus several times over with each hash,
459  compression, and encryption option.
460
461# Acknowledgements
462
463The original idea came from Venti, a content-addressed storage system
464from Plan 9. Venti is usable directly by user applications, and is
465also integrated with the Fossil filesystem to support snapshotting the
466status of a Fossil filesystem. Fossil allows references to either be
467to a block number on the Fossil partition or to a Venti key; so when a
468filesystem has been snapshotted, all it now contains is a "root
469directory" pointer into the Venti archive, and any files modified
470therafter are copied-on-write into Fossil where they may be modified
471until the next snapshot.
472
473We're nowhere near that exciting yet, but using FUSE, we might be able
474to do something similar, which might be fun. However, Venti inspired
475me when I read about it years ago; it showed me how elegant
476content-addressed storage is. Finding out that the Git version control
477system used the same basic tricks really just confirmed this for me.
478
479Also, I'd like to tip my hat to Duplicity. With the changing economics
480of storage presented by services like Amazon S3 and rsync.net, I
481looked to Duplicity as it provided both SFTP and S3 backends. However,
482it worked in terms of full and incremental backups, a model that I
483think made sense for magnetic tapes, but loses out to
484content-addressed snapshots when you have random-access
485media. Duplicity inspired me by its adoption of multiple backends, the
486very backends I want to use, but I still hungered for a
487content-addressed snapshot store.
488
489I'd also like to tip my hat to Box Backup. I've only used it a little,
490because it requires a special server to manage the storage (and I want
491to get my backups *off* of my servers), but it also inspires me with
492directions I'd like to take Ugarit. It's much more aware of real-time
493access to random-access storage than Duplicity, and has a very
494interesting continuous background incremental backup mode, moving away
495from the tape-based paradigm of backups as something you do on a
496special day of the week, like some kind of religious observance. I
497hope the author Ben, who is a good friend of mine, won't mind me
498plundering his source code for details on how to request real-time
499notification of changes from the filesystem, and how to read and write
500extended attributes!
501
502Moving on from the world of backup, I'd like to thank the Chicken Team
503for producing Chicken Scheme. Felix and the community at #chicken on
504Freenode have particularly inspired me with their can-do attitudes to
505combining programming-language elegance and pragmatic engineering -
506two things many would think un-unitable enemies. Of course, they
507didn't do it all themselves - R5RS Scheme and the SRFIs provided a
508solid foundation to build on, and there's a cast of many more in the
509Chicken community, working on other bits of Chicken or just egging
510everyone on. And I can't not thank Henry Baker for writing the seminal
511paper on the technique Chicken uses to implement full tail-calling
512Scheme with cheap continuations on top of C; Henry already had my
513admiration for his work on combining elegance and pragmatism in linear
514logic. Why doesn't he return my calls? I even sent flowers.
515
516A special thanks should go to Christian Kellermann for porting Ugarit
517to use Chicken 4 modules, too, which was otherwise a big bottleneck to
518development, as I was stuck on Chicken 3 for some time!
519
520Thanks to the early adopters who brought me useful feedback, too!
521
522And I'd like to thank my wife for putting up with me spending several
523evenings and weekends and holiday days working on this thing...
524
525# Version history
526
527* 1.0: Migrated from gdbm to sqlite for metadata storage, removing the
528  GPL taint. Unit test suite. backend-cache made into a separate
529  backend binary. Removed backend-log. BUGFIX: file caching uses mtime *and*
530  size now, rather than just mtime. Error handling so we skip objects
531  that we cannot do something with, and proceed to try the rest of the
532  operation.
533
534* 0.8: decoupling backends from the core and into separate binaries,
535  accessed via standard input and output, so they can be run over SSH
536  tunnels and other such magic.
537
538* 0.7: file cache support, sorting of directories so they're archived
539  in canonical order, autoloading of hash/encryption/compression
540  modules so they're not required dependencies any more.
541
542* 0.6: .ugarit support.
543
544* 0.5: Keyed hashing so attackers can't tell what blocks you have,
545  markers in logs so the index can be reconstructed, sha2 support, and
546  passphrase support.
547
548* 0.4: AES encryption.
549
550* 0.3: Added splitlog backend, and fixed a .meta file typo.
551
552* 0.2: Initial public release.
553
554* 0.1: Internal development release.
Note: See TracBrowser for help on using the repository browser.