Thursday, October 30, 2008

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

In what has got to be the most-cited DHT paper, the authors develop a protocol providing the "put" and "lookup" primitives in the wide area and in a completely distributed manor. The basic structure is of a "ring" of keys, which is just to say that keys are chosen from Z/2^nZ. What is very nice about their design is that correctness depends only on a single invariant: the fact that the "next" pointer each chord node maintains is correct. This allows them to analyze some of the basic properities of the system without some additional complexity; for instance, the optimization for log-time lookups (the finger table) is not required for the lookups to work, as you can always reach any key in linear time by following the "next" pointers around the ring. They also work out in some detail the join, leave, repair, and stabilization protocols which allows the DHT ring to remain stable in the face of various types of event.

I don't think there's a question that DHTs are a nice idea with some good applications; the MIT group responsible for this paper went on to explore some more or less obvious uses of them such as Ivy, which is a DHT-based file system (actually, there are many ivy's.) In the first paper the authors contrast their flat key space and completely distributed algorithm with more "traditional" distributed system which function by imposing hierarchy on the collection of machines; they cite FastTrack and DNS. While there is a benefit to being distributed, it also seems to suffer from a lack of "authoritative answers" that DNS can provide; the system there is fundamentally hierarchical which makes it perhaps not the best example. The completely distributed nature also make caching more of an issue. For instance, in a hierarchical system with high fanout but relatively low tree depth, if the nodes closer to the leaves of the tree cache results they can deflect a good deal of traffic from traveling up the tree. This is the equivalent of your ISP's DNS server caching recent DNS queries. Since all traffic flows through that one server, its can have a high hit rate. In a distributed system like Chord, each request would require a lookup traversing mostly different sets of fingers before reaching the ultimate destination.

3 comments:

Ari Rabkin said...

I think that tying the logical "endorsement" hierarchy to the physical lookup hierarchy is a major misfeature of DNS. You can tease them apart with DNS-Sec, where you have certificate chains instead. That way, anybody and their cousin can cache A records, and hand them out like halloween candy.

Randy H. Katz said...

Hey Steve, you seem to be a bit behind!

Randy H. Katz said...

STILL behind I see!