Changeset 25501 in project for release/4/ugarit/trunk/README.txt


Ignore:
Timestamp:
11/14/11 13:51:10 (9 years ago)
Author:
Alaric Snell-Pym
Message:

ugarit: Significant README improvements, and enabled consistency check of read blocks by default, and removed warning about deletions from backend-cache.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • release/4/ugarit/trunk/README.txt

    r25479 r25501  
    33Ugarit is a backup/archival system based around content-addressible storage.
    44
    5 This allows it to upload incremental backups to a remote server or a local filesystem such as an NFS share or a removable hard disk, yet have the archive instantly able to produce a full snapshot on demand rather than needing to download a full snapshot plus all the incrementals since. The content-addressible storage technique means that the incrementals can be applied to a snapshot on various kinds of storage without needing intelligence in the storage itself - so the snapshots can live within Amazon S3 or on a removable hard disk.
    6 
    7 Also, the same storage can be shared between multiple systems that all back up to it - and the incremental upload algorithm will mean that any files shared between the servers will only need to be uploaded once. If you back up a complete server, than go and back up another that is running the same distribution, then all the files in `/bin` and so on that are already in the storage will not need to be backed up again; the system will automatically spot that they're already there, and not upload them again.
     5This allows it to upload incremental backups to a remote server or a
     6local filesystem such as an NFS share or a removable hard disk, yet
     7have the archive instantly able to produce a full snapshot on demand
     8rather than needing to download a full snapshot plus all the
     9incrementals since. The content-addressible storage technique means
     10that the incrementals can be applied to a snapshot on various kinds of
     11storage without needing intelligence in the storage itself - so the
     12snapshots can live within Amazon S3 or on a removable hard disk.
     13
     14Also, the same storage can be shared between multiple systems that all
     15back up to it - and the incremental upload algorithm will mean that
     16any files shared between the servers will only need to be uploaded
     17once. If you back up a complete server, than go and back up another
     18that is running the same distribution, then all the files in `/bin`
     19and so on that are already in the storage will not need to be backed
     20up again; the system will automatically spot that they're already
     21there, and not upload them again.
    822
    923## So what's that mean in practice?
    1024
    11 You can run Ugarit to back up any number of filesystems to a shared archive, and on every backup, Ugarit will only upload files or parts of files that aren't already in the archive - be they from the previous snapshot, earlier snapshots, snapshot of entirely unrelated filesystems, etc. Every time you do a snapshot, Ugarit builds an entire complete directory tree of the snapshot in the archive - but reusing any parts of files, files, or entire directories that already exist anywhere in the archive, and only uploading what doesn't already exist.
    12 
    13 The support for parts of files means that, in many cases, gigantic files like database tables and virtual disks for virtual machines will not need to be uploaded entirely every time they change, as the changed sections will be identified and uploaded.
    14 
    15 Because a complete directory tree exists in the archive for any snapshot, the extraction algorithm is incredibly simple - and, therefore, incredibly reliable and fast. Simple, reliable, and fast are just what you need when you're trying to reconstruct the filesystem of a live server.
    16 
    17 Also, it means that you can do lots of small snapshots. If you run a snapshot every hour, then only a megabyte or two might have changed in your filesystem, so you only upload a megabyte or two - yet you end up with a complete history of your filesystem at hourly intervals in the archive.
    18 
    19 Conventional backup systems usually either store a full backup then incrementals to their archives, meaning that doing a restore involves reading the full backup then reading every incremental since and applying them - so to do a restore, you have to download *every version* of the filesystem you've ever uploaded, or you have to do periodic full backups (even though most of your filesystem won't have changed since the last full backup) to reduce the number of incrementals required for a restore. Better results are had from systems that use a special backup server to look after the archive storage, which accept incremental backups and apply them to the snapshot they keep in order to maintain a most-recent snapshot that can be downloaded in a single run; but they then restrict you to using dedicated servers as your archive stores, ruling out cheap scalable solutions like Amazon S3, or just backing up to a removable USB or eSATA disk you attach to your system whenever you do a backup. And dedicated backup servers are complex pieces of software; can you rely on something complex for the fundamental foundation of your data security system?
     25You can run Ugarit to back up any number of filesystems to a shared
     26archive, and on every backup, Ugarit will only upload files or parts
     27of files that aren't already in the archive - be they from the
     28previous snapshot, earlier snapshots, snapshot of entirely unrelated
     29filesystems, etc. Every time you do a snapshot, Ugarit builds an
     30entire complete directory tree of the snapshot in the archive - but
     31reusing any parts of files, files, or entire directories that already
     32exist anywhere in the archive, and only uploading what doesn't already
     33exist.
     34
     35The support for parts of files means that, in many cases, gigantic
     36files like database tables and virtual disks for virtual machines will
     37not need to be uploaded entirely every time they change, as the
     38changed sections will be identified and uploaded.
     39
     40Because a complete directory tree exists in the archive for any
     41snapshot, the extraction algorithm is incredibly simple - and,
     42therefore, incredibly reliable and fast. Simple, reliable, and fast
     43are just what you need when you're trying to reconstruct the
     44filesystem of a live server.
     45
     46Also, it means that you can do lots of small snapshots. If you run a
     47snapshot every hour, then only a megabyte or two might have changed in
     48your filesystem, so you only upload a megabyte or two - yet you end up
     49with a complete history of your filesystem at hourly intervals in the
     50archive.
     51
     52Conventional backup systems usually either store a full backup then
     53incrementals to their archives, meaning that doing a restore involves
     54reading the full backup then reading every incremental since and
     55applying them - so to do a restore, you have to download *every
     56version* of the filesystem you've ever uploaded, or you have to do
     57periodic full backups (even though most of your filesystem won't have
     58changed since the last full backup) to reduce the number of
     59incrementals required for a restore. Better results are had from
     60systems that use a special backup server to look after the archive
     61storage, which accept incremental backups and apply them to the
     62snapshot they keep in order to maintain a most-recent snapshot that
     63can be downloaded in a single run; but they then restrict you to using
     64dedicated servers as your archive stores, ruling out cheaply scalable
     65solutions like Amazon S3, or just backing up to a removable USB or
     66eSATA disk you attach to your system whenever you do a backup. And
     67dedicated backup servers are complex pieces of software; can you rely
     68on something complex for the fundamental foundation of your data
     69security system?
    2070
    2171## System Requirements
    2272
    23 Ugarit should run on any POSIX-compliant system that can run [Chicken Scheme](http://www.call-with-current-continuation.org/). It stores and restores all the file attributes reported by the `stat` system call - POSIX mode permissions, UID, GID, mtime, and optionally atime and ctime (although the ctime cannot be restored due to POSIX restrictions). Ugarit will store files, directories, device and character special files, symlinks, and FIFOs.
    24 
    25 Support for extended filesystem attributes - ACLs, alternative streams, forks and other metadata - is possible, due to the extensible directory entry format; support for such metadata will be added as required.
    26 
    27 Currently, only local filesystem-based archive storage backends are complete: these are suitable for backing up to a removable hard disk or a filesystem shared via NFS or other protocols. They can also be used to snapshot to local disks, although this is obviously then vulnerable to local system failures; if the computer that's being backed up catches fire, you won't be able to restore it from archives that were also ruined!
    28 
    29 However, the next backend to be implemented will be one for Amazon S3, and an SFTP backend for storing archives anywhere you can ssh to. Other backends will be implemented on demand; an archive can, in principle, be stored on anything that can store files by name, report on whether a file already exists, and efficiently download a file by name. This rules out magnetic tapes due to their requirement for sequential access.
    30 
    31 Although we need to trust that a backend won't lose data (for now), we don't need to trust the backend not to snoop on us, as Ugarit optionally encrypts everything sent to the archive.
     73Ugarit should run on any POSIX-compliant system that can run [Chicken
     74Scheme](http://www.call-with-current-continuation.org/). It stores and
     75restores all the file attributes reported by the `stat` system call -
     76POSIX mode permissions, UID, GID, mtime, and optionally atime and
     77ctime (although the ctime cannot be restored due to POSIX
     78restrictions). Ugarit will store files, directories, device and
     79character special files, symlinks, and FIFOs.
     80
     81Support for extended filesystem attributes - ACLs, alternative
     82streams, forks and other metadata - is possible, due to the extensible
     83directory entry format; support for such metadata will be added as
     84required.
     85
     86Currently, only local filesystem-based archive storage backends are
     87complete: these are suitable for backing up to a removable hard disk
     88or a filesystem shared via NFS or other protocols. However, the
     89backend can be accessed via an SSH tunnel, so a remote server you are
     90able to install Ugarit on to run the backends can be used as a remote
     91archive.
     92
     93However, the next backend to be implemented will be one for Amazon S3,
     94and an SFTP backend for storing archives anywhere you can ssh
     95to. Other backends will be implemented on demand; an archive can, in
     96principle, be stored on anything that can store files by name, report
     97on whether a file already exists, and efficiently download a file by
     98name. This rules out magnetic tapes due to their requirement for
     99sequential access.
     100
     101Although we need to trust that a backend won't lose data (for now), we
     102don't need to trust the backend not to snoop on us, as Ugarit
     103optionally encrypts everything sent to the archive.
     104
     105## Terminology
     106
     107A Ugarit backend is the software module that handles backend
     108storage. An archive is an actual storage system storing actual data,
     109accessed through the appropriate backend for that archive. The backend
     110may run locally under Ugarit itself, or via an SSH tunnel, on a remote
     111server where it is installed.
     112
     113For example, if you use the recommended "splitlog" filesystem backend,
     114your archive might be `/mnt/bigdisk` on the server `prometheus`. The
     115backend (which is compiled along with the other filesystem backends in
     116the `backend-fs` binary) must be installed on `prometheus`, and Ugarit
     117clients all over the place may then use it via ssh to
     118`prometheus`. However, even with the filesystem backends, the actual
     119storage might not be on `prometheus` where the backend runs -
     120`/mnt/bigdisk` might be an NFS mount, or a mount from a storage-area
     121network. This ability to delegate via SSH is particularly useful with
     122the "cache" backend, which reduces latency by storing a cache of what
     123blocks exist in a backend, thereby making it quicker to identify
     124already-stored files; a cluster of servers all sharing the same
     125archive might all use SSH tunnels to access an instance of the "cache"
     126backend on one of them (using some local disk to store the cache),
     127which proxies the actual archive storage to an archive on the other
     128end of a high-latency Internet link, again via an SSH tunnel.
    32129
    33130## What's in an archive?
    34131
    35 An Ugarit archive contains a load of blocks, each up to a maximum size (usually 1MiB, although other backends might impose smaller limits). Each block is identified by the Tiger hash of its contents; this is how Ugarit avoids ever uploading the same data twice, by checking to see if the data to be uploaded already exists in the archive by looking up the hash. The contents of the blocks are compressed and then encrypted before upload.
    36 
    37 Every file uploaded is, unless it's small enough to fit in a single block, chopped into blocks, and each block uploaded. This way, the entire contents of your filesystem can be uploaded - or, at least, only the parts of it that aren't already there! The blocks are then tied together to create a snapshot by upload blocks full of the Tiger hashes of the data blocks, and directory blocks uploaded listing the names and attributes of files in directories, along with the hashes of the blocks that contain the files' contents. Even the blocks that contain lists of hashes of other blocks are subject to checking for pre-existence in the archive; if only a few MiB of your hundred-GiB filesystem has changed, then even the index blocks and directory blocks are re-used from previous snapshots.
    38 
    39 Once uploaded, a block in the archive is never again changed. After all, if its contents changed, its hash would change, so it would no longer be the same block! However, every block has a reference count, tracking the number of index blocks that refer to it. This means that the archive knows which blocks are shared between multiple snapshots (or shared *within* a snapshot - if a filesystem has more than one copy of the same file, still only one copy is uploaded), so that if a given snapshot is deleted, then the blocks that only that snapshot is using can be deleted to free up space, without corrupting other snapshots by deleting blocks they share. Bear in mind, however, that not all storage backends may support this - there are certain advantages to being an append-only archive. For a start, you can't delete something by accident! The supplied filesystem backend supports deletion, while the logfile backend does not. However, the actual deletion command hasn't been implemented yet either, so it's a moot point for now...
    40 
    41 Finally, the archive contains objects called tags. Unlike the blocks, the tags contents can change, and they have meaningful names rather than being identified by hash. Tags identify the top-level blocks of snapshots within the system, from which (by following the chain of hashes down through the index blocks) the entire contents of a snapshot may be found. Unless you happen to have recorded the hash of a snapshot somewhere, the tags are where you find snapshots from when you want to do a restore!
    42 
    43 Whenever a snapshot is taken, as soon as Ugarit has uploaded all the files, directories, and index blocks required, it looks up the tag you have identified as the target of the snapshot. If the tag already exists, then the snapshot it currently points to is recorded in the new snapshot as the "previous snapshot"; then the snapshot header containing the previous snapshot hash, along with the date and time and any comments you provide for the snapshot, and is uploaded (as another block, identified by its hash). The tag is then updated to point to the new snapshot.
    44 
    45 This way, each tag actually identifies a chronological chain of snapshots. Normally, you would use a tag to identify a filesystem being archived; you'd keep snapshotting the filesystem to the same tag, resulting in all the snapshots of that filesystem hanging from the tag. But if you wanted to remember any particular snapshot (perhaps if it's the snapshot you take before a big upgrade or other risky operation), you can duplicate the tag, in effect 'forking' the chain of snapshots much like a branch in a version control system.
     132An Ugarit archive contains a load of blocks, each up to a maximum size
     133(usually 1MiB, although other backends might impose smaller
     134limits). Each block is identified by the hash of its contents; this is
     135how Ugarit avoids ever uploading the same data twice, by checking to
     136see if the data to be uploaded already exists in the archive by
     137looking up the hash. The contents of the blocks are compressed and
     138then encrypted before upload.
     139
     140Every file uploaded is, unless it's small enough to fit in a single
     141block, chopped into blocks, and each block uploaded. This way, the
     142entire contents of your filesystem can be uploaded - or, at least,
     143only the parts of it that aren't already there! The blocks are then
     144tied together to create a snapshot by uploading blocks full of the
     145hashes of the data blocks, and directory blocks are uploaded listing
     146the names and attributes of files in directories, along with the
     147hashes of the blocks that contain the files' contents. Even the blocks
     148that contain lists of hashes of other blocks are subject to checking
     149for pre-existence in the archive; if only a few MiB of your
     150hundred-GiB filesystem has changed, then even the index blocks and
     151directory blocks are re-used from previous snapshots.
     152
     153Once uploaded, a block in the archive is never again changed. After
     154all, if its contents changed, its hash would change, so it would no
     155longer be the same block! However, every block has a reference count,
     156tracking the number of index blocks that refer to it. This means that
     157the archive knows which blocks are shared between multiple snapshots
     158(or shared *within* a snapshot - if a filesystem has more than one
     159copy of the same file, still only one copy is uploaded), so that if a
     160given snapshot is deleted, then the blocks that only that snapshot is
     161using can be deleted to free up space, without corrupting other
     162snapshots by deleting blocks they share. Keep in mind, however, that
     163not all storage backends may support this - there are certain
     164advantages to being an append-only archive. For a start, you can't
     165delete something by accident! The supplied fs backend supports
     166deletion, while the splitlog backend does not yet. However, the actual
     167snapshot deletion command hasn't been implemented yet either, so it's
     168a moot point for now...
     169
     170Finally, the archive contains objects called tags. Unlike the blocks,
     171the tags contents can change, and they have meaningful names rather
     172than being identified by hash. Tags identify the top-level blocks of
     173snapshots within the system, from which (by following the chain of
     174hashes down through the index blocks) the entire contents of a
     175snapshot may be found. Unless you happen to have recorded the hash of
     176a snapshot somewhere, the tags are where you find snapshots from when
     177you want to do a restore!
     178
     179Whenever a snapshot is taken, as soon as Ugarit has uploaded all the
     180files, directories, and index blocks required, it looks up the tag you
     181have identified as the target of the snapshot. If the tag already
     182exists, then the snapshot it currently points to is recorded in the
     183new snapshot as the "previous snapshot"; then the snapshot header
     184containing the previous snapshot hash, along with the date and time
     185and any comments you provide for the snapshot, and is uploaded (as
     186another block, identified by its hash). The tag is then updated to
     187point to the new snapshot.
     188
     189This way, each tag actually identifies a chronological chain of
     190snapshots. Normally, you would use a tag to identify a filesystem
     191being backed up; you'd keep snapshotting the filesystem to the same
     192tag, resulting in all the snapshots of that filesystem hanging from
     193the tag. But if you wanted to remember any particular snapshot
     194(perhaps if it's the snapshot you take before a big upgrade or other
     195risky operation), you can duplicate the tag, in effect 'forking' the
     196chain of snapshots much like a branch in a version control system.
    46197
    47198# Using Ugarit
     
    59210## Setting up an archive
    60211
    61 Firstly, you need to know the archive identifier for the place you'll be storing your archives. This depends on your backend.
    62 
    63 ### Filesystem backend
    64 
    65 The filesystem backend creates archives by storing each block or tag in its own file, in a directory. To keep the objects-per-directory count down, it'll split the files into subdirectories.
    66 
    67 To set up a new filesystem-backend archive, just create an empty directory that Ugarit will have write access to when it runs. It will probably run as root in order to be able to access the contents of files that aren't world-readable (although that's up to you), so be careful of NFS mounts that have `maproot=nobody` set!
    68 
    69 You can then refer to it using the following archive identifier:
    70 
    71       "backend-fs fs ...path to directory..."
    72 
    73 ### Logfile backend
     212Firstly, you need to know the archive identifier for the place you'll
     213be storing your archives. This depends on your backend. The archive
     214identifier is actually the command line used to invoke the backend for
     215a particular archive; communication with the archive is via standard
     216input and output, which is how it's easy to tunnel via ssh.
     217
     218### Local filesystem backends
     219
     220These backends use the local filesystem to store the archives. Of
     221course, the "local filesystem" on a given server might be an NFS mount
     222or mounted from a storage-area network.
     223
     224#### Logfile backend
    74225
    75226The logfile backend works much like the original Venti system. It's
     
    91242      "backend-fs splitlog ...log directory... ...metadata file... max-logfile-size"
    92243
    93 For most platforms, a max-logfile-size of 900000000 (900 MB) should suffice. For now, don't go much bigger than that on 32-bit systems until Chicken's `file-position` function is fixed to work with files >1GB in size.
     244For most platforms, a max-logfile-size of 900000000 (900 MB) should
     245suffice. For now, don't go much bigger than that on 32-bit systems
     246until Chicken's `file-position` function is fixed to work with files
     247>1GB in size.
     248
     249#### Filesystem backend
     250
     251The filesystem backend creates archives by storing each block or tag
     252in its own file, in a directory. To keep the objects-per-directory
     253count down, it'll split the files into subdirectories. Because of
     254this, it uses a stupendous number of inodes (more than the filesystem
     255being backed up). Only use it if you don't mind that; splitlog is much
     256more efficient.
     257
     258To set up a new filesystem-backend archive, just create an empty
     259directory that Ugarit will have write access to when it runs. It will
     260probably run as root in order to be able to access the contents of
     261files that aren't world-readable (although that's up to you), so be
     262careful of NFS mounts that have `maproot=nobody` set!
     263
     264You can then refer to it using the following archive identifier:
     265
     266      "backend-fs fs ...path to directory..."
     267
     268### Proxying backends
     269
     270These backends wrap another archive identifier which the actual
     271storage task is delegated to, but add some value along the way.
     272
     273### SSH tunnelling
     274
     275It's easy to access an archive stored on a remote server. The caveat
     276is that the backend then needs to be installed on the remote server!
     277Since archives are accessed by running the supplied command, and then
     278talking to them via stdin and stdout, the archive identified needs
     279only be:
     280
     281      "ssh ...hostname... '...remove archive identifier...'"
     282
     283### Cache backend
     284
     285The cache backend is used to cache a list of what blocks exist in the
     286proxied backend, so that it can answer queries as to the existance of
     287a block rapidly, even when the proxied backend is on the end of a
     288high-latency link (eg, the Internet). This should speed up snapshots,
     289as existing files are identified by asking the backend if the archive
     290already has them.
     291
     292The cache backend works by storing the cache in a local sqlite
     293file. Given a place for it to store that file, usage is simple:
     294
     295      "backend-cache ...path to cachefile... '...proxied archive identifier...'"
     296
     297The cache file will be automatically created if it doesn't already
     298exist, so make sure there's write access to the containing directory.
     299
     300 - WARNING - WARNING - WARNING - WARNING - WARNING - WARNING -
     301
     302If you use a cache on an archive shared between servers, make sure
     303that you either:
     304
     305 * Never delete things from the archive
     306
     307or
     308
     309 * Make sure all access to the archive is via the same cache
     310
     311If a block is deleted from an archive, and a cache on that archive is
     312not aware of the deletion (as it did not go "through" the caching
     313proxy), then the cache will record that the block exists in the
     314archive when it does not. This will mean that if a snapshot is made
     315through the cache that would use that block, then it will be assumed
     316that the block already exists in the archive when it does
     317not. Therefore, the block will not be uploaded, and a dangling
     318reference will result!
     319
     320Some setups which *are* safe:
     321
     322 * A single server using an archive via a cache, not sharing it with
     323   anyone else.
     324
     325 * A pool of servers using an archive via the same cache.
     326
     327 * A pool of servers using an archive via one or more caches, and
     328   maybe some not via the cache, where nothing is ever deleted from
     329   the archive.
     330
     331 * A pool of servers using an archive via one cache, and maybe some
     332   not via the cache, where deletions are only performed on servers
     333   using the cache, so the cache is always aware.
    94334
    95335## Writing a ugarit.conf
     
    99339      (storage <archive identifier>)
    100340      (hash tiger "<A secret string>")
     341      [double-check]
    101342      [(compression [deflate|lzma])]
    102343      [(encryption aes <key>)]
     
    104345      [(rule ...)]
    105346
    106 The hash line chooses a hash algorithm. Currently Tiger-192 (`tiger`), SHA-256 (`sha256`), SHA-384 (`sha384`) and SHA-512 (`sha512`) are supported; if you omit the line then Tiger will still be used, but it will be a simple hash of the block with the block type appended, which reveals to attackers what blocks you have (as the hash is of the unencrypted block, and the hash is not encrypted). This is useful for development and testing or for use with trusted archives, but not advised for use with archives that attackers may snoop at. Providing a secret string produces a hash function that hashes the block, the type of block, and the secret string, producing hashes that attackers who can snoop the archive cannot use to find known blocks. Whichever hash function you use, you will need to install the required Chicken egg with one of the following commands:
     347The hash line chooses a hash algorithm. Currently Tiger-192 (`tiger`),
     348SHA-256 (`sha256`), SHA-384 (`sha384`) and SHA-512 (`sha512`) are
     349supported; if you omit the line then Tiger will still be used, but it
     350will be a simple hash of the block with the block type appended, which
     351reveals to attackers what blocks you have (as the hash is of the
     352unencrypted block, and the hash is not encrypted). This is useful for
     353development and testing or for use with trusted archives, but not
     354advised for use with archives that attackers may snoop at. Providing a
     355secret string produces a hash function that hashes the block, the type
     356of block, and the secret string, producing hashes that attackers who
     357can snoop the archive cannot use to find known blocks. Whichever hash
     358function you use, you will need to install the required Chicken egg
     359with one of the following commands:
    107360
    108361    chicken-install -s tiger-hash  # for tiger
    109362    chicken-install -s sha2        # for the SHA hashes
    110363
    111 `lzma` is the recommended compression option for low-bandwidth backends or when space is tight, but it's very slow to compress; deflate or no compression at all are better for fast local archives. To have no compression at all, just remove the `(compression ...)` line entirely. Likewise, to use compression, you need to install a Chicken egg:
     364`double-check`, if present, causes Ugarit to perform extra internal
     365consistency checks during backups, which will detect bugs but may slow
     366things down.
     367
     368`lzma` is the recommended compression option for low-bandwidth
     369backends or when space is tight, but it's very slow to compress;
     370deflate or no compression at all are better for fast local
     371archives. To have no compression at all, just remove the `(compression
     372...)` line entirely. Likewise, to use compression, you need to install
     373a Chicken egg:
    112374
    113375       chicken-install -s z3       # for deflate
    114376       chicken-install -s lzma     # for lzma
    115377
    116 Likewise, the `(encryption ...)` line may be omitted to have no encryption; the only currently supported algorithm is aes (in CBC mode) with a key given in hex, as a passphrase (hashed to get a key), or a passphrase read from the terminal on every run. The key may be 16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a hex key, just supply it as a string, like so:
     378Likewise, the `(encryption ...)` line may be omitted to have no
     379encryption; the only currently supported algorithm is aes (in CBC
     380mode) with a key given in hex, as a passphrase (hashed to get a key),
     381or a passphrase read from the terminal on every run. The key may be
     38216, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a
     383hex key, just supply it as a string, like so:
    117384
    118385      (encryption aes "00112233445566778899AABBCCDDEEFF")
     
    125392
    126393      (encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
    127      
     394
    128395...for 256-bit AES.
    129396
    130 Alternatively, you can provide a passphrase, and specify how large a key you want it turned into, like so:
     397Alternatively, you can provide a passphrase, and specify how large a
     398key you want it turned into, like so:
    131399
    132400      (encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
    133401
    134 Finally, the extra-paranoid can request that Ugarit prompt for a passphrase on every run and hash it into a key of the specified length, like so:
     402Finally, the extra-paranoid can request that Ugarit prompt for a
     403passphrase on every run and hash it into a key of the specified
     404length, like so:
    135405
    136406      (encryption aes ([16|24|32] prompt))
     
    138408(note the lack of quotes around `prompt`, distinguishing it from a passphrase)
    139409
    140 Again, as it is an optional feature, to use encryption, you must install the appropriate Chicken egg:
     410Again, as it is an optional feature, to use encryption, you must
     411install the appropriate Chicken egg:
    141412
    142413       chicken-install -s aes
     
    145416of a filesystem tree. The file cache is a file (which Ugarit will
    146417create if it doesn't already exist) mapping filenames to
    147 (mtime,hash,size) tuples; as it scans the filesystem, if it files a
     418(mtime,size,hash) tuples; as it scans the filesystem, if it finds a
    148419file in the cache and the mtime and size have not changed, it will
    149420assume it is already archived under the specified hash. This saves it
     
    156427For example:
    157428
    158       (storage splitlog "/net/spiderman/archive/logs" "/net/spiderman/archive/index" 900000000)
     429      (storage "ssh ugarit@spiderman 'backend-fs splitlog /mnt/ugarit-data /mnt/ugarit-metadata/metadata 900000000'")
    159430      (hash tiger "Giung0ahKahsh9ahphu5EiGhAhth4eeyDahs2aiWAlohr6raYeequ8uiUr3Oojoh")
    160431      (encryption aes (32 "deing2Aechediequohdo6Thuvu0OLoh6fohngio9koush9euX6el9iesh6Aef4augh3WiY7phahmesh2Theeziniem5hushai5zigushohnah1quae1ooXo0eingu1Aifeo1eeSheaz9ieSie9tieneibeiPho0quu6um8weiyagh4kaeshooThooNgeyoul2Ahsahgh8imohw3hoyazai9gaph5ohhaechiedeenusaeghahghipe8ii3oo9choh5cieth5iev3jiedohquai4Thiedah5sah5kohcepheixai3aiPainozooc6zohNeiy6Jeigeesie5eithoo0ciiNae8Nee3eiSuKaiza0VaiPai2eeFooNgeengaif9yaiv9rathuoQuohy0ohth6OiL9aisaetheeWoh9aiQu0yoo6aequ3quoiChi7joonohwuvaipeuh2eiPoogh1Ie8tiequesoshaeBue5ieca8eerah0quieJoNoh3Jiesh1chei8weidixeen1yah1ioChie0xaimahWeeriex5eetiichahP9iey5ux7ahGhei7eejahxooch5eiqu0Pheir9Reiri4ahqueijuchae8eeyieMeixa4ciisioloe9oaroof1eegh4idaeNg5aepeip8mah7ixaiSohtoxaiH4oe5eeGoh4eemu7mee8ietaecu6Zoodoo0hoP5uquaish2ahc7nooshi0Aidae2Zee4pheeZee3taerae6Aepu2Ayaith2iivohp8Wuikohvae2Peange6zeihep8eC9mee8johshaech1Ubohd4Ko5caequaezaigohyai1TheeN6Gohva6jinguev4oox2eet5auv0aiyeo7eJieGheebaeMahshifaeDohy8quut4ueFei3eiCheimoechoo2EegiveeDah1sohs7ezee3oaWa2iiv2Chi1haiS5ahph4phu5su0hiocee3ooyaeghang7sho7maiXeo5aex"))
    161432      (compression lzma)
    162 
    163 Be careful to put a set of parentheses around each configuration entry. White space isn't significant, so feel free to indent things and wrap them over lines if you want.
     433      (file-cache "/var/ugarit/cache")
     434
     435Be careful to put a set of parentheses around each configuration
     436entry. White space isn't significant, so feel free to indent things
     437and wrap them over lines if you want.
    164438
    165439Keep copies of this file safe - you'll need it to do extractions!
     
    170444## Your first backup
    171445
    172 Think of a tag to identify the filesystem you're backing up. If it's `/home` on the server `gandalf`, you might call it `gandalf-home`. If it's the entire filesystem of the server `bilbo`, you might just call it `bilbo`.
     446Think of a tag to identify the filesystem you're backing up. If it's
     447`/home` on the server `gandalf`, you might call it `gandalf-home`. If
     448it's the entire filesystem of the server `bilbo`, you might just call
     449it `bilbo`.
    173450
    174451Then from your shell, run (as root):
     
    180457      # ugarit snapshot ugarit.conf -c localhost-etc /etc
    181458
    182 Specify the `-c` flag if you want to store ctimes in the archive; since it's impossible to restore ctimes when extracting from an archive, doing this is useful only for informational purposes, so it's not done by default. Similarly, atimes aren't stored in the archive unless you specify `-a`, because otherwise, there will be a lot of directory blocks uploaded on every snapshot, as the atime of every file will have been changed by the previous snapshot - so with `-a` specified, on every snapshot, every directory in your filesystem will be uploaded! Ugarit will happily restore atimes if they are found in an archive; their storage is made optional simply because uploading them is costly and rarely useful.
     459Specify the `-c` flag if you want to store ctimes in the archive;
     460since it's impossible to restore ctimes when extracting from an
     461archive, doing this is useful only for informational purposes, so it's
     462not done by default. Similarly, atimes aren't stored in the archive
     463unless you specify `-a`, because otherwise, there will be a lot of
     464directory blocks uploaded on every snapshot, as the atime of every
     465file will have been changed by the previous snapshot - so with `-a`
     466specified, on every snapshot, every directory in your filesystem will
     467be uploaded! Ugarit will happily restore atimes if they are found in
     468an archive; their storage is made optional simply because uploading
     469them is costly and rarely useful.
    183470
    184471## Exploring the archive
    185472
    186 Now you have a backup, you can explore the contents of the archive. This need not be done as root, as long as you can read `ugarit.conf`; however, if you want to extract files, run it as root.
     473Now you have a backup, you can explore the contents of the
     474archive. This need not be done as root, as long as you can read
     475`ugarit.conf`; however, if you want to extract files, run it as root
     476so the uids and gids can be set.
    187477
    188478      $ ugarit explore <ugarit.conf>
    189479
    190 This will put you into an interactive shell exploring a virtual filesystem. The root directory contains an entry for every tag; if you type `ls` you should see your tag listed, and within that tag, you'll find a list of snapshots, in descending date order, with a special entry `current` for the most recent snapshot. Within a snapshot, you'll find the root directory of your snapshot, and will be able to `cd` into subdirectories, and so on:
     480This will put you into an interactive shell exploring a virtual
     481filesystem. The root directory contains an entry for every tag; if you
     482type `ls` you should see your tag listed, and within that tag, you'll
     483find a list of snapshots, in descending date order, with a special
     484entry `current` for the most recent snapshot. Within a snapshot,
     485you'll find the root directory of your snapshot, and will be able to
     486`cd` into subdirectories, and so on:
    191487
    192488      > ls
     
    198494      current <snapshot>
    199495      /Test> cd current
    200       /Test/current> ls   
     496      /Test/current> ls
    201497      README.txt <file>
    202498      LICENCE.txt <symlink>
     
    211507      ctime: 1231988569.0
    212508
    213 As well as exploring around, you can also extract files or directories (or entire snapshots) by using the `get` command. Ugarit will do its best to restore the metadata of files, subject to the rights of the user you run it as.
     509As well as exploring around, you can also extract files or directories
     510(or entire snapshots) by using the `get` command. Ugarit will do its
     511best to restore the metadata of files, subject to the rights of the
     512user you run it as.
    214513
    215514Type `help` to get help in the interactive shell.
     
    217516## Duplicating tags
    218517
    219 As mentioned above, you can duplicate a tag, creating two tags that refer to the same snapshot and its history but that can then have their own subsequent history of snapshots applied to each independently, with the following command:
     518As mentioned above, you can duplicate a tag, creating two tags that
     519refer to the same snapshot and its history but that can then have
     520their own subsequent history of snapshots applied to each
     521independently, with the following command:
    220522
    221523      $ ugarit fork <ugarit.conf> <existing tag> <new tag>
     
    223525## `.ugarit` files
    224526
    225 By default, Ugarit will archive everything it finds in the filesystem tree you tell it to snapshot. However, this might not always be desired; so we provide the facility to override this with `.ugarit` files, or global rules in your `.conf` file.
    226 
    227 Note: The syntax of these files is provisional, as I want to experiment with usability, as the current syntax is ugly. So please don't be surprised if the format changes in incompatible ways in subsequent versions!
    228 
    229 In quick summary, if you want to ignore all files or directories matching a glob in the current directory and below, put the following in a `.ugarit` file in that directory:
     527By default, Ugarit will archive everything it finds in the filesystem
     528tree you tell it to snapshot. However, this might not always be
     529desired; so we provide the facility to override this with `.ugarit`
     530files, or global rules in your `.conf` file.
     531
     532Note: The syntax of these files is provisional, as I want to
     533experiment with usability, as the current syntax is ugly. So please
     534don't be surprised if the format changes in incompatible ways in
     535subsequent versions!
     536
     537In quick summary, if you want to ignore all files or directories
     538matching a glob in the current directory and below, put the following
     539in a `.ugarit` file in that directory:
    230540
    231541      (* (glob "*~") exclude)
    232542
    233 You can write quite complex expressions as well as just globs. The full set of rules is:
    234 
    235 * `(glob "`*pattern*`")` matches files and directories whose names match the glob pattern
    236 * `(name "`*name*`")` matches files and directories with exactly that name (useful for files called `*`...)
    237 * `(modified-within ` *number* ` seconds)` matches files and directories modified within the given number of seconds
    238 * `(modified-within ` *number* ` minutes)` matches files and directories modified within the given number of minutes
    239 * `(modified-within ` *number* ` hours)` matches files and directories modified within the given number of hours
    240 * `(modified-within ` *number* ` days)` matches files and directories modified within the given number of days
    241 * `(not ` *rule*`)` matches files and directories that do not match the given rule
    242 * `(and ` *rule* *rule...*`)` matches files and directories that match all the given rules
    243 * `(or ` *rule* *rule...*`)` matches files and directories that match any of the given rules
    244 
    245 Also, you can override a previous exclusion with an explicit include in a lower-level directory:
     543You can write quite complex expressions as well as just globs. The
     544full set of rules is:
     545
     546* `(glob "`*pattern*`")` matches files and directories whose names
     547  match the glob pattern
     548
     549* `(name "`*name*`")` matches files and directories with exactly that
     550  name (useful for files called `*`...)
     551
     552* `(modified-within ` *number* ` seconds)` matches files and
     553  directories modified within the given number of seconds
     554
     555* `(modified-within ` *number* ` minutes)` matches files and
     556  directories modified within the given number of minutes
     557
     558* `(modified-within ` *number* ` hours)` matches files and directories
     559  modified within the given number of hours
     560
     561* `(modified-within ` *number* ` days)` matches files and directories
     562  modified within the given number of days
     563
     564* `(not ` *rule*`)` matches files and directories that do not match
     565  the given rule
     566
     567* `(and ` *rule* *rule...*`)` matches files and directories that match
     568  all the given rules
     569
     570* `(or ` *rule* *rule...*`)` matches files and directories that match
     571  any of the given rules
     572
     573Also, you can override a previous exclusion with an explicit include
     574in a lower-level directory:
    246575
    247576    (* (glob "*~") include)
    248577
    249 Also, you can bind rules to specific directories, rather than to "this directory and all beneath it", by specifying an absolute or relative path instead of the `*`:
     578You can bind rules to specific directories, rather than to "this
     579directory and all beneath it", by specifying an absolute or relative
     580path instead of the `*`:
    250581
    251582    ("/etc" (name "passwd") exclude)
    252583
    253 If you use a relative path, it's taken relative to the directory of the `.ugarit` file.
    254 
    255 You can also put some rules in your `.conf` file, although relative paths are illegal there, by adding lines of this form to the file:
     584If you use a relative path, it's taken relative to the directory of
     585the `.ugarit` file.
     586
     587You can also put some rules in your `.conf` file, although relative
     588paths are illegal there, by adding lines of this form to the file:
    256589
    257590    (rule * (glob "*~") exclude)
     
    261594## What happens if a snapshot is interrupted?
    262595
    263 Nothing! Whatever blocks have been uploaded will be uploaded, but the snapshot is only added to the tag once the entire filesystem has been snapshotted. So just start the snapshot again. Any files that have already be uploaded will then not need to be uploaded again, so the second snapshot should proceed quickly to the point where it failed before, and continue from there.
    264 
    265 Unless the archive ends up with a partially-uploaded corrupted block due to being interrupted during upload, you'll be fine. The filesystem backend has been written to avoid this by writing the block to a file with the wrong name, then renaming it to the correct name when it's entirely uploaded.
     596Nothing! Whatever blocks have been uploaded will be uploaded, but the
     597snapshot is only added to the tag once the entire filesystem has been
     598snapshotted. So just start the snapshot again. Any files that have
     599already be uploaded will then not need to be uploaded again, so the
     600second snapshot should proceed quickly to the point where it failed
     601before, and continue from there.
     602
     603Unless the archive ends up with a partially-uploaded corrupted block
     604due to being interrupted during upload, you'll be fine. The filesystem
     605backend has been written to avoid this by writing the block to a file
     606with the wrong name, then renaming it to the correct name when it's
     607entirely uploaded.
    266608
    267609## Should I share a single large archive between all my filesystems?
    268610
    269 I think so. Using a single large archive means that blocks shared between servers - eg, software installed from packages and that sort of thing - will only ever need to be uploaded once, saving storage space and upload bandwidth.
     611I think so. Using a single large archive means that blocks shared
     612between servers - eg, software installed from packages and that sort
     613of thing - will only ever need to be uploaded once, saving storage
     614space and upload bandwidth.
     615
     616# Security model
     617
     618I have designed and implemented Ugarit to be able to handle cases
     619where the actual archive storage is not entirely trusted.
     620
     621However, security involves tradeoffs, and Ugarit is configurable in
     622ways that affect its resistance to different kinds of attacks. Here I
     623will list different kinds of attack and explain how Ugarit can deal
     624with them, and how you need to configure it to gain that
     625protection.
     626
     627## Archive snoopers
     628
     629This might be somebody who can intercept Ugarit's communication with
     630the archive at any point, or who can read the archive itself at their
     631leisure.
     632
     633### Reading your data
     634
     635If you enable encryption, then all the blocks sent to the archive are
     636encrypted using a secret key stored in your Ugarit configuration
     637file. As long as that configuration file is kept safe, and the AES
     638algorithm is secure, then attackers who can snoop the archive cannot
     639decode your data blocks. Enabling compression will also help, as the
     640blocks are compressed before encrypting, which is thought to make
     641cryptographic analysis harder.
     642
     643Recommendations: Use compression and encryption when there is a risk
     644of archive snooping. Keep your Ugarit configuration file safe using
     645UNIX file permissions (make it readable only by root), and maybe store
     646it on a removable device that's only plugged in when
     647required. Alternatively, use the "prompt" passphrase option, and be
     648prompted for a passphrase every time you run Ugarit, so it isn't
     649stored on disk anywhere.
     650
     651### Looking for known hashes
     652
     653A block is identified by the hash of its content (before compression
     654and encryption). If an attacker was trying to find people who own a
     655particular file (perhaps a piece of subversive literature), they could
     656search Ugarit archives for its hash.
     657
     658However, Ugarit has the option to "key" the hash with a "salt" stored
     659in the Ugarit configuration file. This means that the hashes used are
     660actually a hash of the block's contents *and* the salt you supply. If
     661you do this with a random salt that you keep secret, then attackers
     662can't check your archive for known content just by comparing the hashes.
     663
     664Recommendations: Provide a secret string to your hash function in your
     665Ugarit configuration file. Keep the Ugarit configuration file safe, as
     666per the advice in the previous point.
     667
     668## Archive modifiers
     669
     670These folks can modify Ugarit's writes into the archive, its reads
     671back from the archive, or can modify the archive itself at their leisure.
     672
     673Modifying an encrypted block without knowing the encryption key can at
     674worst be a denial of service, corrupting the block in an unknown
     675way. An attacker who knows the encryption key could replace a block
     676with valid-seeming but incorrect content. In the worst case, this
     677could exploit a bug in the decompression engine, causing a crash or
     678even an exploit of the Ugarit process itself (thereby gaining the
     679powers of a process inspector, as documented below). We can but hope
     680that the decompression engine is robust. Exploits of the decryption
     681engine, or other parts of Ugarit, are less likely due to the nature of
     682the operations performed upon them.
     683
     684However, if a block is modified, then when Ugarit reads it back, the
     685hash will no longer match the hash Ugarit requested, which will be
     686detected and an error reported. The hash is checked after
     687decryption and decompression, so this check does not protect us
     688against exploits of the decompression engine.
     689
     690This protection is only afforded when the hash Ugarit asks for is not
     691tampered with. Most hashes are obtained from within other blocks,
     692which are therefore safe unless that block has been tampered with; the
     693nature of the hash tree conveys the trust in the hashes up to the
     694root. The root hashes are stored in the archive as "tags", which an
     695archive modifier could alter at will. Therefore, the tags cannot be
     696trusted if somebody might modify the archive. This is why Ugarit
     697prints out the snapshot hash and the root directory hash after
     698performing a snapshot, so you can record them securely outside of the
     699archive.
     700
     701The most likely threat posed by archive modifiers is that they could
     702simply corrupt or delete all of your archive, without needing to know
     703any encryption keys.
     704
     705Recommendations: Secure your archives against modifiers, by whatever
     706means possible. If archive modifiers are still a potential threat,
     707write down a log of your root directory hashes from each snapshot, and keep
     708it safe. When extracting your backups, use the `ls -ll` command in the
     709interface to check the "contents" hash of your snapshots, and check
     710they match the root directory hash you expect.
     711
     712## Process inspectors
     713
     714These folks can attach debuggers or similar tools to running
     715processes, such as Ugarit itself.
     716
     717Ugarit backend processes only see encrypted data, so people who can
     718attach to that process gain the powers of archive snoopers and
     719modifiers, and the same conditions apply.
     720
     721People who can attach to the Ugarit process itself, however, will see
     722the original unencrypted content of your filesystem, and will have
     723full access to the encryption keys and hashing keys stored in your
     724Ugarit configuration. When Ugarit is running with sufficient
     725permissions to restore backups, they will be able to intercept and
     726modify the data as it comes out, and probably gain total write access
     727to your entire filesystem in the process.
     728
     729Recommendations: Ensure that Ugarit does not run under the same user
     730ID as untrusted software. In many cases it will need to run as root in
     731order to gain unfettered access to read the filesystems it is backing
     732up, or to restore the ownership of files. However, when all the files
     733it backs up are world-readable, it could run as an untrusted user for
     734backups, and where file ownership is trivially reconstructible, it can
     735do restores as a limited user, too.
     736
     737## Attackers in the source filesystem
     738
     739These folks create files that Ugarit will back up one day. By having
     740write access to your filesystem, they already have some level of
     741power, and standard Unix security practices such as storage quotas
     742should be used to control them. They may be people with logins on your
     743box, or more subtly, people who can cause servers to writes files;
     744somebody who sends an email to your mailserver will probably cause
     745that message to be written to queue files, as will people who can
     746upload files via any means.
     747
     748Such attackers might use up your available storage by creating large
     749files. This creates a problem in the actual filesystem, but that
     750problem can be fixed by deleting the files. If those files get
     751archived into Ugarit, then they are a part of that snapshot. If you
     752are using a backend that supports deletion, then (when I implement
     753snapshot deletion in the user interface) you could delete that entire
     754snapshot to recover the wasted space, but that is a rather serious
     755operation.
     756
     757More insidiously, such attackers might attempt to abuse a hash
     758collision in order to fool the archive. If they have a way of creating
     759a file that, for instance, has the same hash as your shadow password
     760file, then Ugarit will think that it already has that file when it
     761attempts to snapshot it, and store a reference to the existing
     762file. If that snapshot is restored, then they will receive a copy of
     763your shadow password file. Similarly, if they can predict a future
     764hash of your shadow password file, and create a shadow password file
     765of their own (perhaps one giving them a root account with a known
     766password) with that hash, they can then wait for the real shadow
     767password file to have that hash. If the system is later restored from
     768that snapshot, then their chosen content will appear in the shadow
     769password file. However, doing this requires a very fundamental break
     770of the hash function being used.
     771
     772Recommendations: Think carefully about who has write access to your
     773filesystems, directly or indirectly via a network service that stores
     774received data to disk. Enforce quotas where appropriate, and consider
     775not backing up "queue directories" where untrusted content might
     776appear; migrate incoming content that passes acceptance tests to an
     777area that is backed up. If necessary, the queue might be backed up to
     778a non-snapshotting system, such as rsyncing to another server, so that
     779any excessive files that appear in there are removed from the backup
     780in due course, while still affording protection.
    270781
    271782# Future Directions
     
    274785
    275786## General
     787
     788* More checks with `double-check` mode activated. Perhaps read blocks
     789  back from the archive to check it matches the blocks sent, to detect
     790  hash collisions. Maybe have levels of double-check-ness.
    276791
    277792* Everywhere I use (sql ...) to create an sqlite prepared statement,
     
    285800## Backends
    286801
     802* Look at http://bugs.call-cc.org/ticket/492 - can this help?
     803
     804* Extend the backend protocol with a special "admin" command that
     805  allows for arbitrary backend-specific operations, and write an
     806  ugarit-backend-admin CLI tool to administer backends with it. The
     807  input should be a single s-expression as a list, and the result
     808  should be an alist which is displayed to the user in a friendly
     809  manner, as "Key: Value\n" lines.
     810
     811* Implement "info" admin commands for all backends, that list any
     812  available stats, and at least the backend type and parameters.
     813
    287814* Support for recreating the index and tags on a backend-splitlog if
    288   they get corrupted, from the headers left in the log. Do this by
    289   extending the backend protocol with a special "admin" command that
    290   allows for arbitrary backend-specific operations, and write an
    291   ugarit-backend-admin CLI tool to administer backends with it.
     815  they get corrupted, from the headers left in the log, as a "reindex"
     816  admin command.
     817
     818* Support for flushing the cache on a backend-cache, via an admin
     819  command.
    292820
    293821* Support for unlinking in backend-splitlog, by marking byte ranges as
     
    296824  and removing the entries for the unlinked blocks, perhaps provide an
    297825  option to attempt to re-use existing holes to put blocks in for
    298   online reuse, and provide an offline compaction operation.
     826  online reuse, and provide an offline compaction operation. Keep
     827  stats in the index of how many byte ranges are unused, and how many
     828  bytes unused, in each file, and report them in the info admin
     829  interface, along with the option to compact any or all files.
     830
     831* Have read-only and unlinkable config flags in the backend-split
     832  metadata file, settable via admin commands.
     833
     834* Optional support in backends for keeping a log of tag changes, and
     835  admin commands to read the log.
    299836
    300837* Support for SFTP as a storage backend. Store one file per block, as
     
    311848  will be uploaded to enough archives to make the total trust be at
    312849  least 100%, by randomly picking the archives weighted by their write
    313   load weighting. A local cache will be kept of which backends carry
    314   which blocks, and reads will be serviced by picking the archive that
    315   carries it and has the highest read load weighting. If that archive
    316   is unavailable or has lost the block, then they will be trued in
    317   read load order; and if none of them have it, an exhaustive search
    318   of all available archives will be performed before giving up, and
    319   the cache updated with the results if the block is found. Users will
    320   be recommended to delete the cache if an archive is lost, so it gets
    321   recreated in usage, as otherwise the system may assume blocks are
    322   present when they are not, and thus fail to upload them when
    323   snapshotting. The individual physical archives that we put
    324   replication on top of won't be "valid" archives unless they are 100%
    325   replicated, as they'll contain references to blocks that are on
    326   other archives. It might be a good idea to mark them as such with a
    327   special tag to avoid people trying to restore directly from them. A
    328   copy of the replication configuration could be stored under a
    329   special tag to mark this fact, and to enable easy finding of the
    330   proper replicated archive to work from.
     850  load weighting. A read-only archive automatically gets its write
     851  load weighting set to zero, and a warning issued if it was
     852  configured otherwise. A local cache will be kept of which backends
     853  carry which blocks, and reads will be serviced by picking the
     854  archive that carries it and has the highest read load weighting. If
     855  that archive is unavailable or has lost the block, then they will be
     856  tried in read load order; and if none of them have it, an exhaustive
     857  search of all available archives will be performed before giving up,
     858  and the cache updated with the results if the block is found. In
     859  order to correctly handle archives that were unavailable during
     860  this, we might need to log an "unknown" for that block key / archive
     861  pair, rather than assuming the block is not there, and check it
     862  later. Users
     863  will be given an admin command to notify the backend of an archive
     864  going missing forever, which will cause it to be removed from the
     865  cache. Affected blocks should be examined and re-replicated if their
     866  replication count is now too low. Another command should be
     867  available to warn of impending deliberate removal, which will again
     868  remove the archive from the cluster and re-replicate, the difference
     869  being that the disappearing archive is usable for re-replicating
     870  FROM, so this is a safe operation for blocks that are only on that
     871  one archive. The individual physical archives
     872  that we put replication on top of won't be "valid" archives unless
     873  they are 100% replicated, as they'll contain references to blocks
     874  that are on other archives. It might be a good idea to mark them as
     875  such with a special tag to avoid people trying to restore directly
     876  from them. A copy of the replication configuration could be stored
     877  under a special tag to mark this fact, and to enable easy finding of
     878  the proper replicated archive to work from. There should be a
     879  configurable option to snapshot the cache to the archives whenever
     880  the replicated archive is closed, too. The command line to the
     881  backend, "backend-replicated", should point to an sqlite file for
     882  the configuration and cache, and users should use admin commands to
     883  add/remove/modify archives in the cluster.
    331884
    332885## Core
     
    427980  the framework for this and then let him add API functions as he desires.
    428981
     982* Command-line support to extract the contents of a given path in the
     983  archive, rather than needing to use explore mode. Also the option to
     984  extract given just a block key (useful when reading from keys logged
     985  manually at snapshot time, or from a backend that has a tag log).
     986
    429987* FUSE support. Mount it as a read-only filesystem :-D Then consider
    430988  adding Fossil-style writing to the `current` of a snapshot, with
     
    5251083# Version history
    5261084
     1085* 1.1: Consistency check on read blocks by default. Removed warning
     1086  about deletions from backend-cache; we need a new mechanism to report
     1087  warnings from backends.
     1088
    5271089* 1.0: Migrated from gdbm to sqlite for metadata storage, removing the
    5281090  GPL taint. Unit test suite. backend-cache made into a separate
Note: See TracChangeset for help on using the changeset viewer.