# Chord DHT (Theory)

*19*pages on

this wiki

Chord is a peer-to-peer lookup algorithm for finding a single node in a structured network of peers as a rendezvous point for a given key, which is an index for a desired entity of information. It is one of the four original distributed hash table protocols, along with CAN, Tapestry, and Pastry.

## General Mechanism Edit

The participating nodes are assigned a random 160 bit identifier from this identifier space using consistent hashing. This number when assigned to a node is called node id. These nodes are logically organized in a circle such that circle can have ids/keys ranging from 0 to .

Each node has a *successor* and a *predecessor*. The *successor* to a node or key is the next node in the identifier circle when you move clockwise. The *predecessor* of a node or key is the next node in the id circle when you move counter-clockwise.

A file/data item with key k ( i.e. *HASH[file] = k* ) is mapped to the node whose *id k*. This node is referred to as the *successor* of k - *succ(k)*.

For example, on considering a 6-bit identifier space, the total possible id's a node can pick up are = 64

Assume only 10 nodes exist with hash id's(assigned randomly) as -

1 , 5 , 12 , 16 , 25 , 32 , 38 , 42 , 46 , 55.

If *HASH(f1) = *34 then

*succ(34)*=38 ;

*HASH(f2)*= 8 then

*succ(8)=12*.(Note that all are in same name-space).

Since the successor (or predecessor) node may disappear from the network (because of failure or departure), each node records a whole segment of the circle adjacent to it, i.e. the K nodes preceding it and the K nodes following it. One *successor* and *predecessor* are kept in a list to maintain a high probability that the successor and predecessor pointers actually point to the correct nodes after possible failure or departure of the initial successor or predecessor.

##

Key Resolving Edit

The main issue is to efficiently resolve a key k to the address of *succ(k)*. To connect all these nodes, a *circular doubly-linked-list* kind of structure is maintained where every node is aware if its successor and predecessor. *Successor* of a node id *p* is *succ(p+1)* and *Predecessor* as *pred(p).*

Therefore a linear approach is used to find a node with complexity is of order 0(n). This is a naive method for searching the network.

However, to make more scalable and efficient, each node maintains a *finger table* which acts like an indexer thus helps in reducing the complexity.The max size of a finger-table for any node *p, FTp *is restricted to the bit-size of hash algorithm i.e. 160 entries by default and 6 entries as considered in the example.

Its populated for node p using formula : *FTp = succ( p + 2 pow[i-1] )*. This effectively translates into "*the i-th entry point succeeding p in incremental powers of 2*".

Assume node n wants to resolve a query for key k. Let p be the node that contains k. We will analyze the number of steps to reach p. Let i be such that p is in ,** **__ , n + 2^i __. Node n will contact the smallest node in this interval; call this node f. Fact: f is closer to p than to n. Therefore, in one step, the distance to p decreases by at least half.

Define a node p that has a query for a key k. Suppose node q is the node that the key k maps to in Chord (p ≠ q). Therefore, node p forwards its query to node q , the closest predecessor of k in its finger table, call it the i-th interval of node n, somewhere between p and q. The new distance between f and p is then at most

**2^(i-1)**. Reiterating, each time the distance at least halves and within m steps (with m as bit space for identifier) the query will arrive at node q. Since the identifiers are random after

*forwardings, the probability is*

**log N****2^m/N**) and the expected number of identifiers in this interval is 1 with high probability, so only

*0(log N)*nodes need to be contacted.

If Chord keeps track of *r = 0(log (N))* predecessors/successors, then with high probability, if each node has probability of 1/4 of failing, succ(k) and pred(k) will return the correct nodes.

Simply, the probability that all r nodes fail is = 0(1/N), which is a low probability; so with high probability at least one of them is alive and the node will have the correct pointer.

The API's formed are :

_________________________ _________________________________________________________________

| Function Description

|---------------------------------- --------------------------------------------------------------------------------------------

| get_successor_list(n) Contacts Chord node n and returns n’s successor list. Each node in

| the list includes its Chord ID, IP address and synthetic coordinates.

|---------------------------------- ---------------------------------------------------------------------------------------------

| lookup(k, num) Returns a list of at least num successors of key k. Each node in the

| list includes its Chord ID, IP address and synthetic coordinates.

|________________________ __________________________________________________________________

*get_successor_list(n)* is a simple accessor method for the Chord node n. It is implemented as a single network RPC call.

*lookup(k,num) *must send *O(log N )* RPCs in order to determine the m successors of key k. The value of m affects the latency of the lookup. Higher values of m constrain the lookup routing to travel through specific – potentially high latency – nodes. For example, when m = 16, the lookup finds the exact predecessor of k and request its successor list.

Smaller values of num permit flexibility in the routing which allows high latency nodes to be avoided. A node uses synthetic coordinates to estimate the latency to another node.

By default, *lookup(k)* is regarded as *lookup(k,1)* where only one *succ(k)* is returned.** **

## Insertion and Retrieval Edit

Insertion and retrieval of a file *b* with *k=HASH(b)* is done using the the API's

_________________________________________________________________________

| Function | Description |

|--------------------|--------------------------------------------------------------------------------|

| put(k, b) | Stores the block b under the key k, where k = SHA-1(b). |

|--------------------|--------------------------------------------------------------------------------|

| get(k) | Fetches and returns the block associated with the key k. |

|______________|_________________________________________________________|

**Block Insert : put(k)**

When an application wishes to insert a new block, it calls the put(k, b) procedure. The algorithm is as follows

*void put (k, b){ succ = lookup (k) send (succs.ipaddr, k, b)*

*}*

The latency of the complete put() operation is likely to be dominated by the maximum *round trip time *to any of the *successor*. The lookup is likely to be relatively low latency: proximity routing allows it to contact nearby nodes. The cost of the Chord lookup is likely to be dominated by the latency to the nearest of the *predecessors*.

**Block Fetch: get(k)**

In order to fetch a block, a client application calls *get(k)*. A call to *lookup(k)* is initiated in order to find the list of nodes likely to hold the block. *get() *then sends the *lookup(k)* result an RPC to request a fragment of key k, in parallel.

An application may occasionally need to repeatedly invoke *get(k)* to successfully fetch a given key. Ths algorithm for *get(k)* is as follows :

*block get (k){ succs = lookup (k) <ret, data> = download (key, succ[i]) if (ret == OK) frags.push (data) return (SHA-1(block) != k) ? FAILURE : block return FAILURE}*

##

Node Joining and Leaving a Chord System : Edit

If any node p wants to join, it contacts any arbitary node and requests lookup for *succ(p+1) *and *pred(p+1)* joining is relatively simple just like inserting a node in middle of linked list.

However, complexity arises in is maintaining the finger-tables. Its important to note that *FTq[1]* = *succ(q+1)* i.e the immidiate next node. In order to maintain this consistency, a simple RPC is run periodically which contacts *succ(q+1)* and asks it to return *pred(succ(q+1))* which should return q. If it doesn't (i.e *q's successor* has updated info about *predecessor*)then returned node x is made its successor and asks x to update predecessor list.

In this manner, all successors in the finger-table have to be contacted by issuing RPC request *succ(k)*.

## Potential uses *(with respect to our system)*Edit

: A load balancing mechanism by a local network hosting information available to computers outside of the local network. This scheme could allow developers to balance the load between many computers instead of a central server to ensure availability of their product.*Cooperative Mirroring*: In a network, once a computer joins the network its available data is distributed throughout the network for retrieval when that computer disconnects from the network. As well as other computers' data is sent to the computer in question for off-line retrieval when they are no longer connected to the network. Mainly for nodes without the ability to connect full time to the network.__Time-shared storage__