Distributed Storage System Project Wiki

Chord DHT (Theory)

19pages on
this wiki
Add New Page
Talk0 Share

Chord is a peer-to-peer lookup algorithm for finding a single node in a structured network of peers as a rendezvous point for a given key, which is an index for a desired entity of information. It is one of the four original distributed hash table protocols, along with CAN, Tapestry, and Pastry.

    General Mechanism Edit

The participating nodes are assigned a random 160 bit identifier from this identifier space using consistent hashing. This number when assigned to a node is called node id. These nodes are logically organized in a circle such that circle can have ids/keys ranging from 0 to .

Each node has a successor and a predecessor. The successor to a node or key is the next node in the identifier circle when you move clockwise. The predecessor of a node or key is the next node in the id circle when you move counter-clockwise.

A file/data item with key k ( i.e. HASH[file] = k ) is mapped to the node whose id k. This node is referred to as the successor of k - succ(k).

For example, on considering a 6-bit identifier space, the total possible id's a node can pick up are = 64
Assume only 10 nodes exist with hash id's(assigned randomly) as -
1 , 5 , 12 , 16 , 25 , 32 , 38 , 42 , 46 , 55.

If HASH(f1) = 34 then succ(34)=38 ; HASH(f2) = 8 then succ(8)=12 .(Note that all are in same name-space).

Since the successor (or predecessor) node may disappear from the network (because of failure or departure), each node records a whole segment of the circle adjacent to it, i.e. the K nodes preceding it and the K nodes following it. One successor and predecessor are kept in a list to maintain a high probability that the successor and predecessor pointers actually point to the correct nodes after possible failure or departure of the initial successor or predecessor.

    Key Resolving

The main issue is to efficiently resolve a key k to the address of succ(k). To connect all these nodes, a circular doubly-linked-list kind of structure is maintained where every node is aware if its successor and predecessor. Successor of a node id p is succ(p+1) and Predecessor as pred(p).

Therefore a linear approach is used to find a node with complexity is of order 0(n). This is a naive method for searching the network.

However, to make more scalable and efficient, each node maintains a finger table which acts like an indexer thus helps in reducing the complexity.The max size of a finger-table for any node p, FTp is restricted to the bit-size of hash algorithm i.e. 160 entries by default and 6 entries as considered in the example.

Its populated for node p using formula :  FTp = succ( p + 2 pow[i-1] ). This effectively translates into "the i-th entry point succeeding p in incremental powers of 2".

Assume node n wants to resolve a query for key k. Let p be the node that contains k. We will analyze the number of steps to reach p. Let i be such that p is in , , n + 2^i . Node n will contact the smallest node in this interval; call this node f. Fact: f is closer to p than to n. Therefore, in one step, the distance to p decreases by at least half.

Define a node p that has a query for a key k. Suppose node q is the node that the key k maps to in Chord (p  ≠ q). Therefore, node p forwards its query to node q , the closest predecessor of k in its finger table, call it the i-th interval of node n, somewhere between p and q. The new distance between f and p is then at most 2^(i-1) . Reiterating, each time the distance at least halves and within m steps (with m as bit space for identifier) the query will arrive at node q. Since the identifiers are random after log N forwardings, the probability is 2^m/N) and the expected number of identifiers in this interval is 1 with high probability, so only 0(log N) nodes need to be contacted.

If Chord keeps track of r = 0(log (N)) predecessors/successors, then with high probability, if each node has probability of 1/4 of failing, succ(k) and pred(k) will return the correct nodes.

Simply, the probability that all r nodes fail is = 0(1/N), which is a low probability; so with high probability at least one of them is alive and the node will have the correct pointer.

The API's formed are :

_________________________ _________________________________________________________________
|    Function                                          Description                                                                                     
|---------------------------------- --------------------------------------------------------------------------------------------
| get_successor_list(n)           Contacts Chord node n and returns n’s successor list. Each node in     
|                                               the list includes its Chord ID, IP address and synthetic coordinates.    
|---------------------------------- ---------------------------------------------------------------------------------------------
| lookup(k, num)                    Returns a list of at least num successors of key k. Each node in the    
|                                              list includes its Chord ID, IP address and synthetic coordinates.           
|________________________ __________________________________________________________________

get_successor_list(n) is a simple accessor method for the Chord node n. It is implemented as a single network RPC call.

lookup(k,num) must send O(log N ) RPCs in order to determine the m successors of key k. The value of m affects the latency of the lookup. Higher values of m constrain the lookup routing to travel through specific – potentially high latency – nodes. For example, when m = 16, the lookup finds the exact predecessor of k and request its successor list.

Smaller values of num permit flexibility in the routing which allows high latency nodes to be avoided. A node uses synthetic coordinates to estimate the latency to another node.

By default, lookup(k) is regarded as lookup(k,1) where only one succ(k) is returned.

    Insertion and Retrieval Edit

Insertion and retrieval of a file b with k=HASH(b) is done using the the API's
|    Function       |                Description                                                                  |    
|   put(k, b)        |   Stores the block b under the key k, where k = SHA-1(b).      |
|   get(k)            |   Fetches and returns the block associated with the key k.     |

Block Insert : put(k)

When an application wishes to insert a new block, it calls the put(k, b) procedure. The algorithm is as follows

void put (k, b)
{   succ = lookup (k)  
    send (succs.ipaddr, k, b)


The latency of the complete put() operation is likely to be dominated by the maximum round trip time to any of the successor. The lookup is likely to be relatively low latency: proximity routing allows it to contact nearby nodes. The cost of the Chord lookup is likely to be dominated by the latency to the nearest of the predecessors.

Block Fetch: get(k)

In order to fetch a block, a client application calls get(k). A call to lookup(k) is initiated in order to find the list of nodes likely to hold the block. get() then sends the lookup(k) result an RPC to request a fragment of key k, in parallel.

An application may occasionally need to repeatedly invoke get(k) to successfully fetch a given key. Ths algorithm for get(k) is as follows :

block get (k)
{  succs = lookup (k)  
   <ret, data> = download (key, succ[i])
   if (ret == OK)
        frags.push (data)
   return (SHA-1(block) != k) ? FAILURE : block  
   return FAILURE

     Node Joining and Leaving a Chord System :

If any node p wants to join, it contacts any arbitary node and requests lookup for succ(p+1) and pred(p+1) joining is relatively simple just like inserting a node in middle of linked list.

However, complexity arises in is maintaining the finger-tables. Its important to note that FTq[1] = succ(q+1) i.e the immidiate next node. In order to maintain this consistency, a simple RPC is run periodically which contacts succ(q+1) and asks it to return pred(succ(q+1)) which should return q. If it doesn't (i.e q's successor has updated info about predecessor)then returned node x is made its successor and asks x to update predecessor list.
In this manner, all successors in the finger-table have to be contacted by issuing RPC request succ(k).

      Potential uses (with respect to our system)Edit

  • Cooperative Mirroring: A load balancing mechanism by a local network hosting information available to computers outside of the local network. This scheme could allow developers to balance the load between many computers instead of a central server to ensure availability of their product.
  • Time-shared storage: In a network, once a computer joins the network its available data is distributed throughout the network for retrieval when that computer disconnects from the network. As well as other computers' data is sent to the computer in question for off-line retrieval when they are no longer connected to the network. Mainly for nodes without the ability to connect full time to the network.

Ad blocker interference detected!

Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.