Celeste is a highly-available, ad hoc, distributed, peer-to-peer data store. The system implements semantics for data creation, deletion, arbitrary read and write in a strict-consistency data model.
Celeste divides data into fragments and replicates each fragment on multiple nodes in the system. Celeste dynamically caches and re-caches fragment replicas to ensure that a minimum number of them exist in the system to protect against loss and that enough fragments are available to recover the entirety of the data. Celeste clients create, write, and subsequently read data despite the fact that some of the nodes, and consequently some of the fragments, may be unavailable.
- Address of a node is partitioned into chunks.
- Successive chunks route successive hops.
- Routing table maintained by each node:
- | Hop No. | Value of that hop's chunk |
- Address size 256 bits
- Split into n chunks of k=256/n bits each
- Example of address: for a 6-bit 2 chunk address size, 102, 321 etc. are valid node addresses
- Each routing table has 2k entries: 0, 1, 2, ... 2k-1 and n columns
- In any ith column
- Row 0 points to self
- Row 1 points to any neighbour node having ith chunk of the address as 1
- Row 2k points to any neighbour node having ith chunk of the address as 2k
- Finding an address abcd...lmn:
- Node checks row with entry a and retrieves the neighbour address X from column 1.
- Node sends data to X.
- X finds the neighbour mentioned in row 'b' and column 2 of its table and forwards to that address.
- This goes on till nth column when:
- Either destination is reached.
- Or a node does not have any entry in the row it looks up.
- In this case, the node will use the address in the next row of its table and forward the data there. Even if all rows are empty, the row '0' must have the node's own address, which will become the final destination of the packet.
- Data addressed to a destination may not necessarily, therefore, reach the intended destination.
A Celeste file consists of three kinds of objects:
- A single Anchor Object (AObject) per file
- A single Version Object (VObject) per file version
- Zero or more Block Objects (BObject) each containing file data.
In addition to these file related objects, Celeste creates a set of objects that maintain the file's mutable state. These objects keep track of the mapping between the file itself, as represented by its AObject, and its current version, as represented in a VObject.
Each of these objects maintains parameters for its own replication, and may maintain replication parameters for the objects that it controls. For example, each AObject also records the replication parameters for each of its corresponding VObjects and BObjects. Similarly each VObject records the replication parameters for each of its BObjects.
New objects created as a result of a file update inherit their replication parameters from their corresponding controlling object.
File Creation Edit
Replication parameters for each of the AObject, VObject, and BObjects are supplied during file creationg and are used for the lifetime of the file.
The number of data replicas created is also supplied when the data is stored in the Celeste system. These replicas are created during the write process and the write is not complete until all replicas are safely in place.
This number has a direct impact on the amount of time it takes to store data in Celeste. A low value reduces the amount of time a write takes to complete at the risk that data will be unavailable when needed because insufficient copies are available. A high value increases the amount of time it takes for a write to complete, but ensures that more replicas are available to hedge against failure. Choosing this number is a function of the required availability of the data, the number of objects from the pool of all objects that are missing at any given moment. Each of these parameters is a function of other variables as well.
The mechanism that maintains the value of an Anchor Object to Version Object mapping is implemented as a map from the AObject object-id, to the VObject object-id. The map is implemented as a fault-scalable, byzantine fault-tolerant variable. The variable requires the maintenance of a set of stored objects, each of which is stored on a different node in the system, some of which may be missing, out-of-date, or behaving maliciously (byzantine) when needed.
Free Haven is a research project that aims to deploy a system for distributed, anonymous, persistent data storage which is robust against attempts by powerful adversaries to find and destroy any stored data.
Main goals of the Free Haven Project: Edit
- Two-Way Anonymity: neither the owner of a file knows where it is stored, nor does the host of the file know the owner's identity.
- Accountability: reputation and micropayment schemes, which allow limiting the damage done by misbehaving/malicious servers.
- Persistence of files over time.
- Flexibility: nodes can dynamically join & leave without affecting the functioning of the system.
Each file is divided into "shares" through erasure coding. This allows retrieval of some k out of n shares to be sufficient for complete file reconstruction.
Shares of a file could be stored anywhere on the grid. No track or metadata is maintained by the owner of the file.
- Anonymously inserting a file into the grid.
A user requests any server for a file, passing the receiver's location and a key to deliver the document in a private manner.
A request for a file is then broadcast into the network by the server.
The hosts of the shares of the file encrypt them and deliver to the receiver's location.
- Anonymous retrieval
Trust System Edit
Servers which drop shares or are otherwise unreliable get noticed after a while and are trusted less. A trust module on each server maintains a database of every other server, based on past direct experience and also what other servers have said. If a "bad" server has poor trust level, nodes will prefer not to store files on it.
Issue: the "bad" server is still able to store his own files on the cloud - thus getting more storage space while effectively contributing none.
Pricing Scheme Edit
A user is given limited storage, to ensure that everyone has enough storage space available corresponding to the amount they contribute.
Please edit this section.
Please edit this section.
Google File System Edit
Please edit this section.