source: project/release/4/ugarit/trunk/README.txt @ 25566

Last change on this file since 25566 was 25566, checked in by Alaric Snell-Pym, 9 years ago

ugarit: Fixed non-keyed-hash algorithm, and wrote lots of security stuff in the README.

File size: 63.4 KB
Line 
1# Introduction
2
3Ugarit is a backup/archival system based around content-addressible storage.
4
5This allows it to upload incremental backups to a remote server or a
6local filesystem such as an NFS share or a removable hard disk, yet
7have the archive instantly able to produce a full snapshot on demand
8rather than needing to download a full snapshot plus all the
9incrementals since. The content-addressible storage technique means
10that the incrementals can be applied to a snapshot on various kinds of
11storage without needing intelligence in the storage itself - so the
12snapshots can live within Amazon S3 or on a removable hard disk.
13
14Also, the same storage can be shared between multiple systems that all
15back up to it - and the incremental upload algorithm will mean that
16any files shared between the servers will only need to be uploaded
17once. If you back up a complete server, than go and back up another
18that is running the same distribution, then all the files in `/bin`
19and so on that are already in the storage will not need to be backed
20up again; the system will automatically spot that they're already
21there, and not upload them again.
22
23## So what's that mean in practice?
24
25You can run Ugarit to back up any number of filesystems to a shared
26archive, and on every backup, Ugarit will only upload files or parts
27of files that aren't already in the archive - be they from the
28previous snapshot, earlier snapshots, snapshot of entirely unrelated
29filesystems, etc. Every time you do a snapshot, Ugarit builds an
30entire complete directory tree of the snapshot in the archive - but
31reusing any parts of files, files, or entire directories that already
32exist anywhere in the archive, and only uploading what doesn't already
33exist.
34
35The support for parts of files means that, in many cases, gigantic
36files like database tables and virtual disks for virtual machines will
37not need to be uploaded entirely every time they change, as the
38changed sections will be identified and uploaded.
39
40Because a complete directory tree exists in the archive for any
41snapshot, the extraction algorithm is incredibly simple - and,
42therefore, incredibly reliable and fast. Simple, reliable, and fast
43are just what you need when you're trying to reconstruct the
44filesystem of a live server.
45
46Also, it means that you can do lots of small snapshots. If you run a
47snapshot every hour, then only a megabyte or two might have changed in
48your filesystem, so you only upload a megabyte or two - yet you end up
49with a complete history of your filesystem at hourly intervals in the
50archive.
51
52Conventional backup systems usually either store a full backup then
53incrementals to their archives, meaning that doing a restore involves
54reading the full backup then reading every incremental since and
55applying them - so to do a restore, you have to download *every
56version* of the filesystem you've ever uploaded, or you have to do
57periodic full backups (even though most of your filesystem won't have
58changed since the last full backup) to reduce the number of
59incrementals required for a restore. Better results are had from
60systems that use a special backup server to look after the archive
61storage, which accept incremental backups and apply them to the
62snapshot they keep in order to maintain a most-recent snapshot that
63can be downloaded in a single run; but they then restrict you to using
64dedicated servers as your archive stores, ruling out cheaply scalable
65solutions like Amazon S3, or just backing up to a removable USB or
66eSATA disk you attach to your system whenever you do a backup. And
67dedicated backup servers are complex pieces of software; can you rely
68on something complex for the fundamental foundation of your data
69security system?
70
71## System Requirements
72
73Ugarit should run on any POSIX-compliant system that can run [Chicken
74Scheme](http://www.call-with-current-continuation.org/). It stores and
75restores all the file attributes reported by the `stat` system call -
76POSIX mode permissions, UID, GID, mtime, and optionally atime and
77ctime (although the ctime cannot be restored due to POSIX
78restrictions). Ugarit will store files, directories, device and
79character special files, symlinks, and FIFOs.
80
81Support for extended filesystem attributes - ACLs, alternative
82streams, forks and other metadata - is possible, due to the extensible
83directory entry format; support for such metadata will be added as
84required.
85
86Currently, only local filesystem-based archive storage backends are
87complete: these are suitable for backing up to a removable hard disk
88or a filesystem shared via NFS or other protocols. However, the
89backend can be accessed via an SSH tunnel, so a remote server you are
90able to install Ugarit on to run the backends can be used as a remote
91archive.
92
93However, the next backend to be implemented will be one for Amazon S3,
94and an SFTP backend for storing archives anywhere you can ssh
95to. Other backends will be implemented on demand; an archive can, in
96principle, be stored on anything that can store files by name, report
97on whether a file already exists, and efficiently download a file by
98name. This rules out magnetic tapes due to their requirement for
99sequential access.
100
101Although we need to trust that a backend won't lose data (for now), we
102don't need to trust the backend not to snoop on us, as Ugarit
103optionally encrypts everything sent to the archive.
104
105## Terminology
106
107A Ugarit backend is the software module that handles backend
108storage. An archive is an actual storage system storing actual data,
109accessed through the appropriate backend for that archive. The backend
110may run locally under Ugarit itself, or via an SSH tunnel, on a remote
111server where it is installed.
112
113For example, if you use the recommended "splitlog" filesystem backend,
114your archive might be `/mnt/bigdisk` on the server `prometheus`. The
115backend (which is compiled along with the other filesystem backends in
116the `backend-fs` binary) must be installed on `prometheus`, and Ugarit
117clients all over the place may then use it via ssh to
118`prometheus`. However, even with the filesystem backends, the actual
119storage might not be on `prometheus` where the backend runs -
120`/mnt/bigdisk` might be an NFS mount, or a mount from a storage-area
121network. This ability to delegate via SSH is particularly useful with
122the "cache" backend, which reduces latency by storing a cache of what
123blocks exist in a backend, thereby making it quicker to identify
124already-stored files; a cluster of servers all sharing the same
125archive might all use SSH tunnels to access an instance of the "cache"
126backend on one of them (using some local disk to store the cache),
127which proxies the actual archive storage to an archive on the other
128end of a high-latency Internet link, again via an SSH tunnel.
129
130## What's in an archive?
131
132An Ugarit archive contains a load of blocks, each up to a maximum size
133(usually 1MiB, although other backends might impose smaller
134limits). Each block is identified by the hash of its contents; this is
135how Ugarit avoids ever uploading the same data twice, by checking to
136see if the data to be uploaded already exists in the archive by
137looking up the hash. The contents of the blocks are compressed and
138then encrypted before upload.
139
140Every file uploaded is, unless it's small enough to fit in a single
141block, chopped into blocks, and each block uploaded. This way, the
142entire contents of your filesystem can be uploaded - or, at least,
143only the parts of it that aren't already there! The blocks are then
144tied together to create a snapshot by uploading blocks full of the
145hashes of the data blocks, and directory blocks are uploaded listing
146the names and attributes of files in directories, along with the
147hashes of the blocks that contain the files' contents. Even the blocks
148that contain lists of hashes of other blocks are subject to checking
149for pre-existence in the archive; if only a few MiB of your
150hundred-GiB filesystem has changed, then even the index blocks and
151directory blocks are re-used from previous snapshots.
152
153Once uploaded, a block in the archive is never again changed. After
154all, if its contents changed, its hash would change, so it would no
155longer be the same block! However, every block has a reference count,
156tracking the number of index blocks that refer to it. This means that
157the archive knows which blocks are shared between multiple snapshots
158(or shared *within* a snapshot - if a filesystem has more than one
159copy of the same file, still only one copy is uploaded), so that if a
160given snapshot is deleted, then the blocks that only that snapshot is
161using can be deleted to free up space, without corrupting other
162snapshots by deleting blocks they share. Keep in mind, however, that
163not all storage backends may support this - there are certain
164advantages to being an append-only archive. For a start, you can't
165delete something by accident! The supplied fs backend supports
166deletion, while the splitlog backend does not yet. However, the actual
167snapshot deletion command hasn't been implemented yet either, so it's
168a moot point for now...
169
170Finally, the archive contains objects called tags. Unlike the blocks,
171the tags contents can change, and they have meaningful names rather
172than being identified by hash. Tags identify the top-level blocks of
173snapshots within the system, from which (by following the chain of
174hashes down through the index blocks) the entire contents of a
175snapshot may be found. Unless you happen to have recorded the hash of
176a snapshot somewhere, the tags are where you find snapshots from when
177you want to do a restore!
178
179Whenever a snapshot is taken, as soon as Ugarit has uploaded all the
180files, directories, and index blocks required, it looks up the tag you
181have identified as the target of the snapshot. If the tag already
182exists, then the snapshot it currently points to is recorded in the
183new snapshot as the "previous snapshot"; then the snapshot header
184containing the previous snapshot hash, along with the date and time
185and any comments you provide for the snapshot, and is uploaded (as
186another block, identified by its hash). The tag is then updated to
187point to the new snapshot.
188
189This way, each tag actually identifies a chronological chain of
190snapshots. Normally, you would use a tag to identify a filesystem
191being backed up; you'd keep snapshotting the filesystem to the same
192tag, resulting in all the snapshots of that filesystem hanging from
193the tag. But if you wanted to remember any particular snapshot
194(perhaps if it's the snapshot you take before a big upgrade or other
195risky operation), you can duplicate the tag, in effect 'forking' the
196chain of snapshots much like a branch in a version control system.
197
198# Using Ugarit
199
200## Installation
201
202Install [Chicken Scheme](http://www.call-with-current-continuation.org/) using their [installation instructions](http://chicken.wiki.br/Getting%20started#Installing%20Chicken).
203
204Ugarit can then be installed by typing (as root):
205
206    chicken-install ugarit
207
208See the [chicken-install manual](http://wiki.call-cc.org/manual/Extensions#chicken-install-reference) for details if you have any trouble, or wish to install into your home directory.
209
210## Setting up an archive
211
212Firstly, you need to know the archive identifier for the place you'll
213be storing your archives. This depends on your backend. The archive
214identifier is actually the command line used to invoke the backend for
215a particular archive; communication with the archive is via standard
216input and output, which is how it's easy to tunnel via ssh.
217
218### Local filesystem backends
219
220These backends use the local filesystem to store the archives. Of
221course, the "local filesystem" on a given server might be an NFS mount
222or mounted from a storage-area network.
223
224#### Logfile backend
225
226The logfile backend works much like the original Venti system. It's
227append-only - you won't be able to delete old snapshots from a logfile
228archive, even when I implement deletion. It stores the archive in two
229sets of files; one is a log of data blocks, split at a specified
230maximum size, and the other is the metadata: an sqlite database used
231to track the location of blocks in the log files, the contents of
232tags, and a count of the logs so a filename can be chosen for a new one.
233
234To set up a new logfile archive, just choose where to put the two
235parts. It would be nice to put the metadata file on a different
236physical disk to the logs directory, to reduce seeking. If you only
237have one disk, you can put the metadata file in the log directory
238("metadata" is a good name).
239
240You can then refer to it using the following archive identifier:
241
242      "backend-fs splitlog ...log directory... ...metadata file... max-logfile-size"
243
244For most platforms, a max-logfile-size of 900000000 (900 MB) should
245suffice. For now, don't go much bigger than that on 32-bit systems
246until Chicken's `file-position` function is fixed to work with files
247more than 1GB in size.
248
249#### Filesystem backend
250
251The filesystem backend creates archives by storing each block or tag
252in its own file, in a directory. To keep the objects-per-directory
253count down, it'll split the files into subdirectories. Because of
254this, it uses a stupendous number of inodes (more than the filesystem
255being backed up). Only use it if you don't mind that; splitlog is much
256more efficient.
257
258To set up a new filesystem-backend archive, just create an empty
259directory that Ugarit will have write access to when it runs. It will
260probably run as root in order to be able to access the contents of
261files that aren't world-readable (although that's up to you), so be
262careful of NFS mounts that have `maproot=nobody` set!
263
264You can then refer to it using the following archive identifier:
265
266      "backend-fs fs ...path to directory..."
267
268### Proxying backends
269
270These backends wrap another archive identifier which the actual
271storage task is delegated to, but add some value along the way.
272
273### SSH tunnelling
274
275It's easy to access an archive stored on a remote server. The caveat
276is that the backend then needs to be installed on the remote server!
277Since archives are accessed by running the supplied command, and then
278talking to them via stdin and stdout, the archive identified needs
279only be:
280
281      "ssh ...hostname... '...remote archive identifier...'"
282
283### Cache backend
284
285The cache backend is used to cache a list of what blocks exist in the
286proxied backend, so that it can answer queries as to the existance of
287a block rapidly, even when the proxied backend is on the end of a
288high-latency link (eg, the Internet). This should speed up snapshots,
289as existing files are identified by asking the backend if the archive
290already has them.
291
292The cache backend works by storing the cache in a local sqlite
293file. Given a place for it to store that file, usage is simple:
294
295      "backend-cache ...path to cachefile... '...proxied archive identifier...'"
296
297The cache file will be automatically created if it doesn't already
298exist, so make sure there's write access to the containing directory.
299
300 - WARNING - WARNING - WARNING - WARNING - WARNING - WARNING -
301
302If you use a cache on an archive shared between servers, make sure
303that you either:
304
305 * Never delete things from the archive
306
307or
308
309 * Make sure all access to the archive is via the same cache
310
311If a block is deleted from an archive, and a cache on that archive is
312not aware of the deletion (as it did not go "through" the caching
313proxy), then the cache will record that the block exists in the
314archive when it does not. This will mean that if a snapshot is made
315through the cache that would use that block, then it will be assumed
316that the block already exists in the archive when it does
317not. Therefore, the block will not be uploaded, and a dangling
318reference will result!
319
320Some setups which *are* safe:
321
322 * A single server using an archive via a cache, not sharing it with
323   anyone else.
324
325 * A pool of servers using an archive via the same cache.
326
327 * A pool of servers using an archive via one or more caches, and
328   maybe some not via the cache, where nothing is ever deleted from
329   the archive.
330
331 * A pool of servers using an archive via one cache, and maybe some
332   not via the cache, where deletions are only performed on servers
333   using the cache, so the cache is always aware.
334
335## Writing a ugarit.conf
336
337`ugarit.conf` should look something like this:
338
339      (storage <archive identifier>)
340      (hash tiger "<salt>")
341      [double-check]
342      [(compression [deflate|lzma])]
343      [(encryption aes <key>)]
344      [(file-cache "<path>")]
345      [(rule ...)]
346
347The hash line chooses a hash algorithm. Currently Tiger-192 (`tiger`),
348SHA-256 (`sha256`), SHA-384 (`sha384`) and SHA-512 (`sha512`) are
349supported; if you omit the line then Tiger will still be used, but it
350will be a simple hash of the block with the block type appended, which
351reveals to attackers what blocks you have (as the hash is of the
352unencrypted block, and the hash is not encrypted). This is useful for
353development and testing or for use with trusted archives, but not
354advised for use with archives that attackers may snoop at. Providing a
355salt string produces a hash function that hashes the block, the type
356of block, and the salt string, producing hashes that attackers who can
357snoop the archive cannot use to find known blocks (see the "Security
358model" section below for more details).
359
360I would recommend that you create a salt string from a secure entropy
361source, such as:
362
363   dd if=/dev/random bs=1 count=64 | base64 -w 0
364
365Whichever hash function you use, you will need to install the required
366Chicken egg with one of the following commands:
367
368    chicken-install -s tiger-hash  # for tiger
369    chicken-install -s sha2        # for the SHA hashes
370
371`double-check`, if present, causes Ugarit to perform extra internal
372consistency checks during backups, which will detect bugs but may slow
373things down.
374
375`lzma` is the recommended compression option for low-bandwidth
376backends or when space is tight, but it's very slow to compress;
377deflate or no compression at all are better for fast local
378archives. To have no compression at all, just remove the `(compression
379...)` line entirely. Likewise, to use compression, you need to install
380a Chicken egg:
381
382       chicken-install -s z3       # for deflate
383       chicken-install -s lzma     # for lzma
384
385Likewise, the `(encryption ...)` line may be omitted to have no
386encryption; the only currently supported algorithm is aes (in CBC
387mode) with a key given in hex, as a passphrase (hashed to get a key),
388or a passphrase read from the terminal on every run. The key may be
38916, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a
390hex key, just supply it as a string, like so:
391
392      (encryption aes "00112233445566778899AABBCCDDEEFF")
393
394...for 128-bit AES,
395
396      (encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")
397
398...for 192-bit AES, or
399
400      (encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
401
402...for 256-bit AES.
403
404Alternatively, you can provide a passphrase, and specify how large a
405key you want it turned into, like so:
406
407      (encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
408
409I would recommend that you generate a long passphrase from a secure
410entropy source, such as:
411
412   dd if=/dev/random bs=1 count=64 | base64 -w 0
413
414Finally, the extra-paranoid can request that Ugarit prompt for a
415passphrase on every run and hash it into a key of the specified
416length, like so:
417
418      (encryption aes ([16|24|32] prompt))
419
420(note the lack of quotes around `prompt`, distinguishing it from a passphrase)
421
422Please read the "Security model" section below for details on the
423implications of different encryption setups.
424
425Again, as it is an optional feature, to use encryption, you must
426install the appropriate Chicken egg:
427
428       chicken-install -s aes
429
430A file cache, if enabled, significantly speeds up subsequent snapshots
431of a filesystem tree. The file cache is a file (which Ugarit will
432create if it doesn't already exist) mapping filenames to
433(mtime,size,hash) tuples; as it scans the filesystem, if it finds a
434file in the cache and the mtime and size have not changed, it will
435assume it is already archived under the specified hash. This saves it
436from having to read the entire file to hash it and then check if the
437hash is present in the archive. In other words, if only a few files
438have changed since the last snapshot, then snapshotting a directory
439tree becomes an O(N) operation, where N is the number of files, rather
440than an O(M) operation, where M is the total size of files involved.
441
442For example:
443
444      (storage "ssh ugarit@spiderman 'backend-fs splitlog /mnt/ugarit-data /mnt/ugarit-metadata/metadata 900000000'")
445      (hash tiger "i3HO7JeLCSa6Wa55uqTRqp4jppUYbXoxme7YpcHPnuoA+11ez9iOIA6B6eBIhZ0MbdLvvFZZWnRgJAzY8K2JBQ")
446      (encryption aes (32 "FN9m34J4bbD3vhPqh6+4BjjXDSPYpuyskJX73T1t60PP0rPdC3AxlrjVn4YDyaFSbx5WRAn4JBr7SBn2PLyxJw"))
447      (compression lzma)
448      (file-cache "/var/ugarit/cache")
449
450Be careful to put a set of parentheses around each configuration
451entry. White space isn't significant, so feel free to indent things
452and wrap them over lines if you want.
453
454Keep copies of this file safe - you'll need it to do extractions!
455Print a copy out and lock it in your fire safe! Ok, currently, you
456might be able to recreate it if you remember where you put the
457storage, but encryption keys and hash salts are harder to remember...
458
459## Your first backup
460
461Think of a tag to identify the filesystem you're backing up. If it's
462`/home` on the server `gandalf`, you might call it `gandalf-home`. If
463it's the entire filesystem of the server `bilbo`, you might just call
464it `bilbo`.
465
466Then from your shell, run (as root):
467
468      # ugarit snapshot <ugarit.conf> [-c] [-a] <tag> <path to root of filesystem>
469
470For example, if we have a `ugarit.conf` in the current directory:
471
472      # ugarit snapshot ugarit.conf -c localhost-etc /etc
473
474Specify the `-c` flag if you want to store ctimes in the archive;
475since it's impossible to restore ctimes when extracting from an
476archive, doing this is useful only for informational purposes, so it's
477not done by default. Similarly, atimes aren't stored in the archive
478unless you specify `-a`, because otherwise, there will be a lot of
479directory blocks uploaded on every snapshot, as the atime of every
480file will have been changed by the previous snapshot - so with `-a`
481specified, on every snapshot, every directory in your filesystem will
482be uploaded! Ugarit will happily restore atimes if they are found in
483an archive; their storage is made optional simply because uploading
484them is costly and rarely useful.
485
486## Exploring the archive
487
488Now you have a backup, you can explore the contents of the
489archive. This need not be done as root, as long as you can read
490`ugarit.conf`; however, if you want to extract files, run it as root
491so the uids and gids can be set.
492
493      $ ugarit explore <ugarit.conf>
494
495This will put you into an interactive shell exploring a virtual
496filesystem. The root directory contains an entry for every tag; if you
497type `ls` you should see your tag listed, and within that tag, you'll
498find a list of snapshots, in descending date order, with a special
499entry `current` for the most recent snapshot. Within a snapshot,
500you'll find the root directory of your snapshot, and will be able to
501`cd` into subdirectories, and so on:
502
503      > ls
504      Test <tag>
505      > cd Test
506      /Test> ls
507      2009-01-24 10:28:16 <snapshot>
508      2009-01-24 10:28:16 <snapshot>
509      current <snapshot>
510      /Test> cd current
511      /Test/current> ls
512      README.txt <file>
513      LICENCE.txt <symlink>
514      subdir <dir>
515      .svn <dir>
516      FIFO <fifo>
517      chardev <character-device>
518      blockdev <block-device>
519      /Test/current> ls -ll LICENCE.txt
520      lrwxr-xr-x 1000 100 2009-01-15 03:02:49 LICENCE.txt -> subdir/LICENCE.txt
521      target: subdir/LICENCE.txt
522      ctime: 1231988569.0
523
524As well as exploring around, you can also extract files or directories
525(or entire snapshots) by using the `get` command. Ugarit will do its
526best to restore the metadata of files, subject to the rights of the
527user you run it as.
528
529Type `help` to get help in the interactive shell.
530
531## Duplicating tags
532
533As mentioned above, you can duplicate a tag, creating two tags that
534refer to the same snapshot and its history but that can then have
535their own subsequent history of snapshots applied to each
536independently, with the following command:
537
538      $ ugarit fork <ugarit.conf> <existing tag> <new tag>
539
540## `.ugarit` files
541
542By default, Ugarit will archive everything it finds in the filesystem
543tree you tell it to snapshot. However, this might not always be
544desired; so we provide the facility to override this with `.ugarit`
545files, or global rules in your `.conf` file.
546
547Note: The syntax of these files is provisional, as I want to
548experiment with usability, as the current syntax is ugly. So please
549don't be surprised if the format changes in incompatible ways in
550subsequent versions!
551
552In quick summary, if you want to ignore all files or directories
553matching a glob in the current directory and below, put the following
554in a `.ugarit` file in that directory:
555
556      (* (glob "*~") exclude)
557
558You can write quite complex expressions as well as just globs. The
559full set of rules is:
560
561* `(glob "`*pattern*`")` matches files and directories whose names
562  match the glob pattern
563
564* `(name "`*name*`")` matches files and directories with exactly that
565  name (useful for files called `*`...)
566
567* `(modified-within ` *number* ` seconds)` matches files and
568  directories modified within the given number of seconds
569
570* `(modified-within ` *number* ` minutes)` matches files and
571  directories modified within the given number of minutes
572
573* `(modified-within ` *number* ` hours)` matches files and directories
574  modified within the given number of hours
575
576* `(modified-within ` *number* ` days)` matches files and directories
577  modified within the given number of days
578
579* `(not ` *rule*`)` matches files and directories that do not match
580  the given rule
581
582* `(and ` *rule* *rule...*`)` matches files and directories that match
583  all the given rules
584
585* `(or ` *rule* *rule...*`)` matches files and directories that match
586  any of the given rules
587
588Also, you can override a previous exclusion with an explicit include
589in a lower-level directory:
590
591    (* (glob "*~") include)
592
593You can bind rules to specific directories, rather than to "this
594directory and all beneath it", by specifying an absolute or relative
595path instead of the `*`:
596
597    ("/etc" (name "passwd") exclude)
598
599If you use a relative path, it's taken relative to the directory of
600the `.ugarit` file.
601
602You can also put some rules in your `.conf` file, although relative
603paths are illegal there, by adding lines of this form to the file:
604
605    (rule * (glob "*~") exclude)
606
607# Questions and Answers
608
609## What happens if a snapshot is interrupted?
610
611Nothing! Whatever blocks have been uploaded will be uploaded, but the
612snapshot is only added to the tag once the entire filesystem has been
613snapshotted. So just start the snapshot again. Any files that have
614already be uploaded will then not need to be uploaded again, so the
615second snapshot should proceed quickly to the point where it failed
616before, and continue from there.
617
618Unless the archive ends up with a partially-uploaded corrupted block
619due to being interrupted during upload, you'll be fine. The filesystem
620backend has been written to avoid this by writing the block to a file
621with the wrong name, then renaming it to the correct name when it's
622entirely uploaded.
623
624Actually, there is *one* caveat: blocks that were uploaded, but never
625make it into a finished snapshot, will be marked as "referenced" but
626there's no snapshot to delete to un-reference them, so they'll never
627be removed when you delete snapshots. (Not that snapshot deletion is
628implemented yet, mind). If this becomes a problem for people, we could
629write a "garbage collect" tool that regenerates the reference counts
630in an archive, leading to unused blocks (with a zero refcount) being
631unlinked.
632
633## Should I share a single large archive between all my filesystems?
634
635I think so. Using a single large archive means that blocks shared
636between servers - eg, software installed from packages and that sort
637of thing - will only ever need to be uploaded once, saving storage
638space and upload bandwidth. However, do not share an archive between
639servers that do not mutually trust each other, as they can all update
640the same tags, so can meddle with each other's snapshots - and read
641each other's snapshots.
642
643# Security model
644
645I have designed and implemented Ugarit to be able to handle cases
646where the actual archive storage is not entirely trusted.
647
648However, security involves tradeoffs, and Ugarit is configurable in
649ways that affect its resistance to different kinds of attacks. Here I
650will list different kinds of attack and explain how Ugarit can deal
651with them, and how you need to configure it to gain that
652protection.
653
654## Archive snoopers
655
656This might be somebody who can intercept Ugarit's communication with
657the archive at any point, or who can read the archive itself at their
658leisure.
659
660Ugarit's splitlog backend creates files with "rw-------" permissions
661out of the box to try and prevent this. This is a pain for people who
662want to share archives between UIDs, but we can add a configuration
663option to override this if that becomes a problem.
664
665### Reading your data
666
667If you enable encryption, then all the blocks sent to the archive are
668encrypted using a secret key stored in your Ugarit configuration
669file. As long as that configuration file is kept safe, and the AES
670algorithm is secure, then attackers who can snoop the archive cannot
671decode your data blocks. Enabling compression will also help, as the
672blocks are compressed before encrypting, which is thought to make
673cryptographic analysis harder.
674
675Recommendations: Use compression and encryption when there is a risk
676of archive snooping. Keep your Ugarit configuration file safe using
677UNIX file permissions (make it readable only by root), and maybe store
678it on a removable device that's only plugged in when
679required. Alternatively, use the "prompt" passphrase option, and be
680prompted for a passphrase every time you run Ugarit, so it isn't
681stored on disk anywhere.
682
683### Looking for known hashes
684
685A block is identified by the hash of its content (before compression
686and encryption). If an attacker was trying to find people who own a
687particular file (perhaps a piece of subversive literature), they could
688search Ugarit archives for its hash.
689
690However, Ugarit has the option to "key" the hash with a "salt" stored
691in the Ugarit configuration file. This means that the hashes used are
692actually a hash of the block's contents *and* the salt you supply. If
693you do this with a random salt that you keep secret, then attackers
694can't check your archive for known content just by comparing the hashes.
695
696Recommendations: Provide a secret string to your hash function in your
697Ugarit configuration file. Keep the Ugarit configuration file safe, as
698per the advice in the previous point.
699
700## Archive modifiers
701
702These folks can modify Ugarit's writes into the archive, its reads
703back from the archive, or can modify the archive itself at their leisure.
704
705Modifying an encrypted block without knowing the encryption key can at
706worst be a denial of service, corrupting the block in an unknown
707way. An attacker who knows the encryption key could replace a block
708with valid-seeming but incorrect content. In the worst case, this
709could exploit a bug in the decompression engine, causing a crash or
710even an exploit of the Ugarit process itself (thereby gaining the
711powers of a process inspector, as documented below). We can but hope
712that the decompression engine is robust. Exploits of the decryption
713engine, or other parts of Ugarit, are less likely due to the nature of
714the operations performed upon them.
715
716However, if a block is modified, then when Ugarit reads it back, the
717hash will no longer match the hash Ugarit requested, which will be
718detected and an error reported. The hash is checked after
719decryption and decompression, so this check does not protect us
720against exploits of the decompression engine.
721
722This protection is only afforded when the hash Ugarit asks for is not
723tampered with. Most hashes are obtained from within other blocks,
724which are therefore safe unless that block has been tampered with; the
725nature of the hash tree conveys the trust in the hashes up to the
726root. The root hashes are stored in the archive as "tags", which an
727archive modifier could alter at will. Therefore, the tags cannot be
728trusted if somebody might modify the archive. This is why Ugarit
729prints out the snapshot hash and the root directory hash after
730performing a snapshot, so you can record them securely outside of the
731archive.
732
733The most likely threat posed by archive modifiers is that they could
734simply corrupt or delete all of your archive, without needing to know
735any encryption keys.
736
737Recommendations: Secure your archives against modifiers, by whatever
738means possible. If archive modifiers are still a potential threat,
739write down a log of your root directory hashes from each snapshot, and keep
740it safe. When extracting your backups, use the `ls -ll` command in the
741interface to check the "contents" hash of your snapshots, and check
742they match the root directory hash you expect.
743
744## Process inspectors
745
746These folks can attach debuggers or similar tools to running
747processes, such as Ugarit itself.
748
749Ugarit backend processes only see encrypted data, so people who can
750attach to that process gain the powers of archive snoopers and
751modifiers, and the same conditions apply.
752
753People who can attach to the Ugarit process itself, however, will see
754the original unencrypted content of your filesystem, and will have
755full access to the encryption keys and hashing keys stored in your
756Ugarit configuration. When Ugarit is running with sufficient
757permissions to restore backups, they will be able to intercept and
758modify the data as it comes out, and probably gain total write access
759to your entire filesystem in the process.
760
761Recommendations: Ensure that Ugarit does not run under the same user
762ID as untrusted software. In many cases it will need to run as root in
763order to gain unfettered access to read the filesystems it is backing
764up, or to restore the ownership of files. However, when all the files
765it backs up are world-readable, it could run as an untrusted user for
766backups, and where file ownership is trivially reconstructible, it can
767do restores as a limited user, too.
768
769## Attackers in the source filesystem
770
771These folks create files that Ugarit will back up one day. By having
772write access to your filesystem, they already have some level of
773power, and standard Unix security practices such as storage quotas
774should be used to control them. They may be people with logins on your
775box, or more subtly, people who can cause servers to writes files;
776somebody who sends an email to your mailserver will probably cause
777that message to be written to queue files, as will people who can
778upload files via any means.
779
780Such attackers might use up your available storage by creating large
781files. This creates a problem in the actual filesystem, but that
782problem can be fixed by deleting the files. If those files get
783archived into Ugarit, then they are a part of that snapshot. If you
784are using a backend that supports deletion, then (when I implement
785snapshot deletion in the user interface) you could delete that entire
786snapshot to recover the wasted space, but that is a rather serious
787operation.
788
789More insidiously, such attackers might attempt to abuse a hash
790collision in order to fool the archive. If they have a way of creating
791a file that, for instance, has the same hash as your shadow password
792file, then Ugarit will think that it already has that file when it
793attempts to snapshot it, and store a reference to the existing
794file. If that snapshot is restored, then they will receive a copy of
795your shadow password file. Similarly, if they can predict a future
796hash of your shadow password file, and create a shadow password file
797of their own (perhaps one giving them a root account with a known
798password) with that hash, they can then wait for the real shadow
799password file to have that hash. If the system is later restored from
800that snapshot, then their chosen content will appear in the shadow
801password file. However, doing this requires a very fundamental break
802of the hash function being used.
803
804Recommendations: Think carefully about who has write access to your
805filesystems, directly or indirectly via a network service that stores
806received data to disk. Enforce quotas where appropriate, and consider
807not backing up "queue directories" where untrusted content might
808appear; migrate incoming content that passes acceptance tests to an
809area that is backed up. If necessary, the queue might be backed up to
810a non-snapshotting system, such as rsyncing to another server, so that
811any excessive files that appear in there are removed from the backup
812in due course, while still affording protection.
813
814# Future Directions
815
816Here's a list of planned developments, in approximate priority order:
817
818## General
819
820* More checks with `double-check` mode activated. Perhaps read blocks
821  back from the archive to check it matches the blocks sent, to detect
822  hash collisions. Maybe have levels of double-check-ness.
823
824* Migrate the source repo to Fossil (when there's a
825  kitten-technologies.co.uk migration to Fossil), and update the egg
826  locations thingy.
827
828* Profile the system. As of 1.0.1, having done the periodic SQLite
829  commits improvement, Ugarit is doing around 250KiB/sec on my home
830  fileserver, but using 87% CPU in the ugarit procesa and 25% in the
831  backend-fs process, when dealing with large files (so full 1MiB
832  blocks are being processed). This suggests that the main
833  block-handling loop in `store-file!` is less than efficient; reading
834  via `current-input-port` rather than using the POSIX egg `file-read`
835  functions may be a mistake, and there is probably more copying afoot
836  than we need.
837
838## Backends
839
840* Create ugarit-backend-protocol-2, and extend import-backend to
841  support it. The differences are:
842
843  * Extend the backend API to have all API calls return a possibly
844    empty list of log messages before the actual result. When
845    importing a backend, provide a logging callback which is passed
846    these lists and feeds them into a logging mechanism which prints
847    them and stores them in the archive object for later logging into
848    the snapshot. The same logging interface can then be used for
849    warnings from within ugarit-core itself as well.
850
851  * Extend the backend API to have an initial list of log messages and
852    a possible error or success for initialisation, inside the
853    header. Make the command-line wrappers for backends use this to
854    indicate startup failure.
855
856* Carefully document backend API for other backend authors: in
857  particular note behaviour in crash situations - we assume that after
858  a succesful flush! all previous blocks are safe, but after a flush,
859  if some blocks make it, then all previous blocks must have. Eg,
860  writes are done in order and periodically auto-flushed, in
861  effect. This invariant is required for the file-cache to be safe
862  (see v1.0.2).
863
864* Lock the archive for writing in backend-splitlog, so that two
865  snapshots to the same archive don't collide. Do we lock per `put!`
866  to allow interleaving, or is that too inefficient? In which case, we
867  need to hold a lock that persists for a while, and release it
868  periodically to allow other writers to the same archive to have a
869  chance.
870
871* Make backend-splitlog write the current log file offset as well as
872  number into the metadata on each flush, and on startup, either
873  truncate the file to that position (to remove anything written but
874  not flushed to the metadata) or scan the log onwards from that point
875  to find (complete) blocks that did not get flushed to the metadata.
876
877* Make `lock-tag!` fail if the tag is already locked. Make the archive
878  block and retry a few times in that case.
879
880* Extend the backend protocol with a special "admin" command that
881  allows for arbitrary backend-specific operations, and write an
882  ugarit-backend-admin CLI tool to administer backends with it. The
883  input should be a single s-expression as a list, and the result
884  should be an alist which is displayed to the user in a friendly
885  manner, as "Key: Value\n" lines.
886
887* Implement "info" admin commands for all backends, that list any
888  available stats, and at least the backend type and parameters.
889
890* Support for recreating the index and tags on a backend-splitlog if
891  they get corrupted, from the headers left in the log, as a "reindex"
892  admin command.
893
894* Support for flushing the cache on a backend-cache, via an admin
895  command, rather than having to delete the cache file.
896
897* Support for unlinking in backend-splitlog, by marking byte ranges as
898  unused in the metadata (and by touching the headers in the log so we
899  maintain the invariant that the metadata is a reconstructible cache)
900  and removing the entries for the unlinked blocks, perhaps provide an
901  option to attempt to re-use existing holes to put blocks in for
902  online reuse, and provide an offline compaction operation. Keep
903  stats in the index of how many byte ranges are unused, and how many
904  bytes unused, in each file, and report them in the info admin
905  interface, along with the option to compact any or all files. We'll
906  need to store refcounts in the backend metadata (should we log
907  reuses, then, so the metadata can always be reconstructed, or just
908  set them to NULL on a reconstruct); when this is enabled on an
909  existing archive with no refcounts, default them to NULL, and treat
910  a NULL refcount as "infinity".
911
912* Have read-only and unlinkable and block size config flags in the
913  backend-split metadata file, settable via admin commands.
914
915* For people doing remote backups who want to not hog resources, write
916  a proxy backend that throttles bandwidth usage. Make it record the
917  time it last sent a request to the backend, and the number of bytes
918  read and written; then when a new request comes in, delay it until
919  at least the largest of (write bandwidth quota * bytes written) and
920  (read bandwidth quota * bytes read) seconds has passed since the
921  last request was sent. NOTE: Start the clock when SENDING, so the
922  time spent handling the request is already counting towards
923  bandwidth quotas, or it won't be fair.
924
925* Support for SFTP as a storage backend. Store one file per block, as
926  per `backend-fs`, but remotely. See
927  http://tools.ietf.org/html/draft-ietf-secsh-filexfer-13 for sftp
928  protocol specs; popen an `ssh -p sftp` connection to the server then
929  talk that simple binary protocol. Tada! Ideally make an sftp egg,
930  then a "ugarit-backend-sftp" egg to keep the dependencies optional.
931
932* Support for S3 as a storage backend. There is now an S3 egg! Make an
933  "ugarit-backend-s3" egg to keep the dependencies optional.
934
935* Support for replicated archives. This will involve a special storage
936  backend that can wrap any number of other archives, each tagged with
937  a trust percentage and read and write load weightings. Each block
938  will be uploaded to enough archives to make the total trust be at
939  least 100%, by randomly picking the archives weighted by their write
940  load weighting. A read-only archive automatically gets its write
941  load weighting set to zero, and a warning issued if it was
942  configured otherwise. A local cache will be kept of which backends
943  carry which blocks, and reads will be serviced by picking the
944  archive that carries it and has the highest read load weighting. If
945  that archive is unavailable or has lost the block, then they will be
946  tried in read load order; and if none of them have it, an exhaustive
947  search of all available archives will be performed before giving up,
948  and the cache updated with the results if the block is found. In
949  order to correctly handle archives that were unavailable during
950  this, we might need to log an "unknown" for that block key / archive
951  pair, rather than assuming the block is not there, and check it
952  later. Users will be given an admin command to notify the backend of
953  an archive going missing forever, which will cause it to be removed
954  from the cache. Affected blocks should be examined and re-replicated
955  if their replication count is now too low. Another command should be
956  available to warn of impending deliberate removal, which will again
957  remove the archive from the cluster and re-replicate, the difference
958  being that the disappearing archive is usable for re-replicating
959  FROM, so this is a safe operation for blocks that are only on that
960  one archive. The individual physical archives that we put
961  replication on top of won't be "valid" archives unless they are 100%
962  replicated, as they'll contain references to blocks that are on
963  other archives. It might be a good idea to mark them as such with a
964  special tag to avoid people trying to restore directly from them;
965  the frontend should complain if you attempt to directly use an
966  archive with the special tag in place. A copy of the replication
967  configuration could be stored under a special tag to mark this fact,
968  and to enable easy finding of the proper replicated archive to work
969  from. There should be a configurable option to snapshot the cache to
970  the archives whenever the replicated archive is closed, too. The
971  command line to the backend, "backend-replicated", should point to
972  an sqlite file for the configuration and cache, and users should use
973  admin commands to add/remove/modify archives in the cluster.
974
975## Core
976
977* Add the option to support full HMAC for salted hashing; make this
978  the recommended setting, with syntax `(hash tiger hmac "SALT")`, and
979  require `(hash tiger simple "SALT")` to explicitly request legacy
980  mode. Note this in the upgrade notes for existing users.
981
982* Add the option to append HMACed signatures to the post-encryption
983  blocks in the archive, to protect against people who tamper with
984  blocks in order to try and exploit vulnerabilities in the
985  decompression or decryption code (and to more quickly detect
986  tampering in the pipeline, to reduce the DoS effect of all that
987  wasted decryption and decompression, potentially including things
988  that decrypt to giant amounts of RAM).
989
990* When extracting, wrap each restore operation under
991  extract-directory! with exception handling that logs the error and
992  then continues with the next dirent in the directory.
993
994* Check sensibly-worded conditions are raised when we try and fetch
995  nonexistant or corrupted blocks from the archive in `archive-get`.
996
997* Make `fold-archive-node`'s listing of tags at the top level report
998  the lock status of the tags.
999
1000* More stats. Log bytes written AFTER compression and encryption in
1001  `archive-put!`. Log snapshot start and end times in the snapshot
1002  object.
1003
1004* SIGINFO support. Add a SIGINFO handler that sets a flag, and make
1005  the `store-file!` and `store-directory!` main loops look for the
1006  flag and, if set, display what path we're working on, and perhaps a
1007  quick summary of the bytes/blocks stored/skipped stats.
1008
1009* Clarify what characters are legal in tag names sent to backends, and
1010  what are legal in human-supplied tag names, and check that
1011  human-supplied tag names match a regular expression. Leave space for
1012  system-only tag names for storing archive metadata; suggest making a
1013  hash sign illegal in tag names.
1014
1015* Clarify what characters are legal in block keys. Ugarit will only
1016  issue [a-zA-Z0-9] for normal blocks, but may use other characters
1017  (hash?) for special metadata blocks; establish a contract of what
1018  backends must support (a-z, A-Z, 0-9, hash?)
1019
1020* API documentation for the modules we export
1021
1022* Encrypt tags, with a hash inside to check it's decrypted
1023  correctly. Add a special "#ugarit-archive-format" tag that records a
1024  format version number, to note that this change has been
1025  applied. Provide an upgrade tool. Don't do auto-upgrades, or
1026  attackers will be able to drop in plaintext tags.
1027
1028* Store a test block in the archive that is used to check the same
1029  encryption and hash settings are used for an archive, consistently
1030  (changing compression setting is supported, but changing encryption
1031  or hash will lead to confusion). Encrypt the hash of the passphrase
1032  and store it in the test block, which should have a name that cannot
1033  clash with any actual hash (eg, use non-hex characters in its
1034  name). When the block does not exist, create it; when it does exist,
1035  check it against the current encryption and hashing settings to see
1036  if it matches. When creating a new block, if the "prompt" passphrase
1037  specification mechanism is in use, prompt again to confirm the
1038  passphrase. If no encryption is in use, check the hash algorithm
1039  doesn't change by storing the hash of a constant string,
1040  unencrypted. To make brute-forcing the passphrase or hash-salt
1041  harder, consider applying the hash a large number of times, to
1042  increase the compute cost of checking it. Thanks to Andy Bennett for
1043  this idea.
1044
1045* More `.ugarit` actions. Right now we just have exclude and include;
1046  we might specify less-safe operations such as commands to run before
1047  and after snapshotting certain subtrees, or filters (don't send this
1048  SVN repository; instead send the output of `svnadmin dump`),
1049  etc. Running arbitrary commands is a security risk if random users
1050  write their own `.ugarit` files - so we'd need some trust-based
1051  mechanism; they'd need to be explicitly enabled in `ugarit.conf`,
1052  then a `.ugarit` option could disable all unsafe operations in a
1053  subtree.
1054
1055* `.ugarit` rules for file sizes. In particular, a rule to exclude
1056  files above a certain size. Thanks to Andy Bennett for this idea.
1057
1058* Support for FFS flags, Mac OS X extended filesystem attributes, NTFS
1059  ACLs/streams, FAT attributes, etc... Ben says to look at Box Backup
1060  for some code to do that sort of thing.
1061
1062* Deletion support - letting you remove snapshots. Perhaps you might
1063  want to remove all snapshots older than a given number of days on a
1064  given tag. Or just remove X out of Y snapshots older than a given
1065  number of days on a given tag. We have the core support for this;
1066  just find a snapshot and `unlink-directory!` its contents, leaving a
1067  dangling pointer from the snapshot, and write the snapshot handling
1068  code to expect this. Again, check Box Backup for that.
1069
1070* Option, when backing up, to not cross mountpoints
1071
1072* Option, when backing up, to store inode number and mountpoint path
1073  in directory entries, and then when extracting, keeping a dictionary
1074  of this unique identifier to pathname, so that if a file to be
1075  extracted is already in the dictionary and the hash is the same, a
1076  hardlink can be created.
1077
1078* Archival mode as well as snapshot mode. Whereas a snapshot record
1079  takes a filesystem tree and adds it to a chain of snapshots of the
1080  same filesystem tree, archival mode takes a filesystem tree and
1081  inserts it into a search tree anchored on the specified tag,
1082  indexing it on a list of key+value properties supplied at archival
1083  time. An archive tag is represented in the virtual filesystem as a
1084  directory full of archive objects, each identified by their full
1085  hash; each archive object references the filesystem root as well as
1086  the key+value properties, and optionally a parent link like a
1087  snapshot, as an archive can be made that explicitly replaces an
1088  earlier one and should replace it in the index; there is also a
1089  virtual directory for each indexed property which contains a
1090  directory for each value of the property, full of symlinks to the
1091  archive objects, and subdirectories that allow multi-property
1092  searches on other properties. The index itself is stored as a B-Tree
1093  with a reasonably small block size; when it's updated, the modified
1094  index blocks are replaced, thereby gaining new hashes, so their
1095  parents need replacing, all the way up the tree until a new root
1096  block is created. The existing block unlink mechanism in the
1097  backends will reclaim storage for blocks that are superceded, if the
1098  backend supports it. When this is done, ugarit will offer the option
1099  of snapshotting to a snapshot tag, or archiving to an archive tag,
1100  or archiving to an archive tag while replacing a specified archive
1101  object (nominated by path within the tag), which causes it to be
1102  removed from the index (except from the directory listing all
1103  archives by hash), and the new archive object is inserted,
1104  referencing the old one as a parent.
1105
1106* Dump/restore format. On a dump, walk an arbitrary subtree of an
1107  archive, serialising objects. Do not put any hashes in the dump
1108  format - dump out entire files, and just identify objects with
1109  sequential numbers when forming the directory / snapshot trees. On a
1110  restore, read the same format and slide it into an archive (creating
1111  any required top-level snapshot objects if the dump doesn't start
1112  from a snapshot) and putting it onto a specified tag. The
1113  intention is that this format can be used to migrate your stuff
1114  between archives, perhaps to change to a better backend.
1115
1116* Optional progress reporting callback from within store-file! and
1117  store-directory!, called on each block within a file or on each
1118  filesystem object, respectively.
1119
1120* Add a procedure to resolve a path within the archive node tree from
1121  any root node. Pass in the path as a list of strings, with the
1122  symbols `.` and `..` being usable as meta-characters to do nothing
1123  or to go up a level. Write a utility procedure to parse a string
1124  into such a form. Make it recognise and follow symlinks.
1125
1126* When symlinks are traversed by the path resolver and by the explore
1127  CLI, make `<tag>/current` be a symlink to the timestamp of the
1128  current snapshot rather than a clone of it, for neatness.
1129
1130## Front-end
1131
1132* Install progress reporting callbacks to report progress to user;
1133  option for quiet (no reporting), normal (reporting if >60s have
1134  passed since last time), or verbose (report every file), or very
1135  verbose (report every file and block).
1136
1137* Make the explore CLI let you cd into symlinks
1138
1139* Add a command to force removing a tag lock.
1140
1141* Add a command to list all the tags (with a * next to locked tags)
1142
1143* Add a command to list the contents of any directory in the archive
1144  node tree
1145
1146* Better error messages
1147
1148* API mode: Works something like the backend API, except at the
1149  archive level. Supports all the important archive operations, plus
1150  access to sexpr stream writers and key stream writers,
1151  archive-node-fold, etc. Requested by andyjpb, perhaps I can write
1152  the framework for this and then let him add API functions as he desires.
1153
1154* Command-line support to extract the contents of a given path in the
1155  archive, rather than needing to use explore mode. Also the option to
1156  extract given just a block key (useful when reading from keys logged
1157  manually at snapshot time).
1158
1159* FUSE/9p support. Mount it as a read-only filesystem :-D Then
1160  consider adding Fossil-style writing to the `current` of a snapshot,
1161  with copy-on-write of blocks to a buffer area on the local disk,
1162  then the option to make a snapshot of `current`. Put these into
1163  separate "ugarit-frontend-9p" and "ugarit-frontend-fuse" eggs, to
1164  control the dependencies.
1165
1166* Filesystem watching. Even with the hash-caching trick, a snapshot
1167  will still involve walking the entire directory tree and looking up
1168  every file in the hash cache. We can do better than that - some
1169  platforms provide an interface for receiving real-time notifications
1170  of changed or added files. Using this, we could allow ugarit to run
1171  in continuous mode, keeping a log of file notifications from the OS
1172  while it does an initial full snapshot. It can then wait for a
1173  specified period (one hour, perhaps?), accumulating names of files
1174  changed since it started, before then creating a new snapshot by
1175  uploading just the files it knows to have changed, while subsequent
1176  file change notifications go to a new list.
1177
1178## Testing
1179
1180* An option to verify a snapshot, walking every block in it checking
1181  there's no dangling references, and that everything matches its
1182  hash, without needing to put it into a filesystem, and applying any
1183  other sanity checks we can think of en route. Optionally compare it
1184  to an on-disk filesystem, while we're at it.
1185
1186* A unit test script around the `ugarit` command-line tool; the corpus
1187  should contain a mix of tiny and huge files and directories, awkward
1188  cases for sharing of blocks (many identical files in the same dir,
1189  etc), complex forms of file metadata, and so on. It should archive
1190  and restore the corpus several times over with each hash,
1191  compression, and encryption option.
1192
1193* Testing crashes. See about writing a test backend binary that either
1194  raises an error or just kills the process directly after N
1195  operations, and sit in a loop running it with increasing N. Take N
1196  from an environment variable to make it easier to automate this.
1197
1198* Extract the debugging backend from backend-devtools into a proper
1199  backend binary that takes a path to a log file and a backend command
1200  line to wrap.
1201
1202* Invoke the archive unit tests with every compression and encryption
1203  option, and different hashing algorithms with and without keys
1204
1205# Acknowledgements
1206
1207The original idea came from Venti, a content-addressed storage system
1208from Plan 9. Venti is usable directly by user applications, and is
1209also integrated with the Fossil filesystem to support snapshotting the
1210status of a Fossil filesystem. Fossil allows references to either be
1211to a block number on the Fossil partition or to a Venti key; so when a
1212filesystem has been snapshotted, all it now contains is a "root
1213directory" pointer into the Venti archive, and any files modified
1214therafter are copied-on-write into Fossil where they may be modified
1215until the next snapshot.
1216
1217We're nowhere near that exciting yet, but using FUSE, we might be able
1218to do something similar, which might be fun. However, Venti inspired
1219me when I read about it years ago; it showed me how elegant
1220content-addressed storage is. Finding out that the Git version control
1221system used the same basic tricks really just confirmed this for me.
1222
1223Also, I'd like to tip my hat to Duplicity. With the changing economics
1224of storage presented by services like Amazon S3 and rsync.net, I
1225looked to Duplicity as it provided both SFTP and S3 backends. However,
1226it worked in terms of full and incremental backups, a model that I
1227think made sense for magnetic tapes, but loses out to
1228content-addressed snapshots when you have random-access
1229media. Duplicity inspired me by its adoption of multiple backends, the
1230very backends I want to use, but I still hungered for a
1231content-addressed snapshot store.
1232
1233I'd also like to tip my hat to Box Backup. I've only used it a little,
1234because it requires a special server to manage the storage (and I want
1235to get my backups *off* of my servers), but it also inspires me with
1236directions I'd like to take Ugarit. It's much more aware of real-time
1237access to random-access storage than Duplicity, and has a very
1238interesting continuous background incremental backup mode, moving away
1239from the tape-based paradigm of backups as something you do on a
1240special day of the week, like some kind of religious observance. I
1241hope the author Ben, who is a good friend of mine, won't mind me
1242plundering his source code for details on how to request real-time
1243notification of changes from the filesystem, and how to read and write
1244extended attributes!
1245
1246Moving on from the world of backup, I'd like to thank the Chicken Team
1247for producing Chicken Scheme. Felix and the community at #chicken on
1248Freenode have particularly inspired me with their can-do attitudes to
1249combining programming-language elegance and pragmatic engineering -
1250two things many would think un-unitable enemies. Of course, they
1251didn't do it all themselves - R5RS Scheme and the SRFIs provided a
1252solid foundation to build on, and there's a cast of many more in the
1253Chicken community, working on other bits of Chicken or just egging
1254everyone on. And I can't not thank Henry Baker for writing the seminal
1255paper on the technique Chicken uses to implement full tail-calling
1256Scheme with cheap continuations on top of C; Henry already had my
1257admiration for his work on combining elegance and pragmatism in linear
1258logic. Why doesn't he return my calls? I even sent flowers.
1259
1260A special thanks should go to Christian Kellermann for porting Ugarit
1261to use Chicken 4 modules, too, which was otherwise a big bottleneck to
1262development, as I was stuck on Chicken 3 for some time! And to Andy
1263Bennett for many insightful conversations about future directions.
1264
1265Thanks to the early adopters who brought me useful feedback, too!
1266
1267And I'd like to thank my wife for putting up with me spending several
1268evenings and weekends and holiday days working on this thing...
1269
1270# Version history
1271
1272* 1.0.2: Made the file cache also commit periodically, rather than on
1273  every write, in order to improve performance. Counting blocks and
1274  bytes uploaded / reused, and file cache bytes as well as hits;
1275  reporting same in snapshot UI and logging same to snapshot
1276  metadata. Switched to the `posix-extras` egg and ditched our own
1277  `posixextras.scm` wrappers. Used the `parley` egg in the `ugarit
1278  explore` CLI for line editing. BUGFIX: Made file cache check the
1279  file hashes it finds in the cache actually exist in the archive, to
1280  protect against the case where a crash of some kind has caused
1281  unflushed changes to be lost; the file cache may well have committed
1282  changes that the backend hasn't, leading to references to
1283  nonexistant blocks. Note that we assume that archives are
1284  sequentially safe, eg if the final indirect block of a large file
1285  made it, all the partial blocks must have made it too. BUGFIX: Added
1286  an explicit `flush!` command to the backend protocol, and put
1287  explicit flushes at critical points in higher layers
1288  (`backend-cache`, the archive abstraction in the Ugarit core, and
1289  when tagging a snapshot) so that we ensure the blocks we point at
1290  are flushed before committing references to them in the
1291  `backend-cache` or file caches, or into tags, to ensure crash
1292  safety. BUGFIX: Made the splitlog backend never exceed the file size
1293  limit (except when passed blocks that, plus a header, are larger
1294  than it), rather than letting a partial block hang over the
1295  'end'. BUGFIX: Fixed tag locking, which was broken all over the
1296  place. Concurrent snapshots to the same tag should now block for one
1297  another, although why you'd want to *do* that is
1298  questionable. BUGFIX: Fixed generation of non-keyed hashes, which
1299  was incorrectly appending the type to the hash without an outer
1300  hash. This breaks backwards compatability, but nobody was using the
1301  old algorithm, right? I'll introduce it as an option if required.
1302
1303* 1.0.1: Consistency check on read blocks by default. Removed warning
1304  about deletions from backend-cache; we need a new mechanism to
1305  report warnings from backends to the user. Made backend-cache and
1306  backend-fs/splitlog commit periodically rather than after every
1307  insert, which should speed up snapshotting a lot, and reused the
1308  prepared statements rather than re-preparing them all the
1309  time. BUGFIX: splitlog backend now creates log files with
1310  "rw-------" rather than "rwx------" permissions; and all sqlite
1311  databases (splitlog metadata, cache file, and file-cache file) are
1312  created with "rw-------" rather then "rw-r--r--".
1313
1314* 1.0: Migrated from gdbm to sqlite for metadata storage, removing the
1315  GPL taint. Unit test suite. backend-cache made into a separate
1316  backend binary. Removed backend-log. BUGFIX: file caching uses mtime *and*
1317  size now, rather than just mtime. Error handling so we skip objects
1318  that we cannot do something with, and proceed to try the rest of the
1319  operation.
1320
1321* 0.8: decoupling backends from the core and into separate binaries,
1322  accessed via standard input and output, so they can be run over SSH
1323  tunnels and other such magic.
1324
1325* 0.7: file cache support, sorting of directories so they're archived
1326  in canonical order, autoloading of hash/encryption/compression
1327  modules so they're not required dependencies any more.
1328
1329* 0.6: .ugarit support.
1330
1331* 0.5: Keyed hashing so attackers can't tell what blocks you have,
1332  markers in logs so the index can be reconstructed, sha2 support, and
1333  passphrase support.
1334
1335* 0.4: AES encryption.
1336
1337* 0.3: Added splitlog backend, and fixed a .meta file typo.
1338
1339* 0.2: Initial public release.
1340
1341* 0.1: Internal development release.
Note: See TracBrowser for help on using the repository browser.