Thursday, October 30, 2008

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

In what has got to be the most-cited DHT paper, the authors develop a protocol providing the "put" and "lookup" primitives in the wide area and in a completely distributed manor. The basic structure is of a "ring" of keys, which is just to say that keys are chosen from Z/2^nZ. What is very nice about their design is that correctness depends only on a single invariant: the fact that the "next" pointer each chord node maintains is correct. This allows them to analyze some of the basic properities of the system without some additional complexity; for instance, the optimization for log-time lookups (the finger table) is not required for the lookups to work, as you can always reach any key in linear time by following the "next" pointers around the ring. They also work out in some detail the join, leave, repair, and stabilization protocols which allows the DHT ring to remain stable in the face of various types of event.

I don't think there's a question that DHTs are a nice idea with some good applications; the MIT group responsible for this paper went on to explore some more or less obvious uses of them such as Ivy, which is a DHT-based file system (actually, there are many ivy's.) In the first paper the authors contrast their flat key space and completely distributed algorithm with more "traditional" distributed system which function by imposing hierarchy on the collection of machines; they cite FastTrack and DNS. While there is a benefit to being distributed, it also seems to suffer from a lack of "authoritative answers" that DNS can provide; the system there is fundamentally hierarchical which makes it perhaps not the best example. The completely distributed nature also make caching more of an issue. For instance, in a hierarchical system with high fanout but relatively low tree depth, if the nodes closer to the leaves of the tree cache results they can deflect a good deal of traffic from traveling up the tree. This is the equivalent of your ISP's DNS server caching recent DNS queries. Since all traffic flows through that one server, its can have a high hit rate. In a distributed system like Chord, each request would require a lookup traversing mostly different sets of fingers before reaching the ultimate destination.

Looking up Data in P2P Systems

The authors of this (the same authors who came up with Chord, and perhaps also (Tapestry) present a bit of a "state of the field" in distributed hash tables. Although a short paper in CACM, they actually go into some detail about four different DHT algorithms: Chord, Tapestry, Pastry, and Kademlia. It's an interesting read to hear about the differences, such as the distance functions, where they identify the most important properties as symmetry and unidirectionality. They also look briefly at other lookup ideas, like tree routing and CAN (not really unrelated.)

They note that dealing with concurrent and asynchronous failures and joins is a challenge that is hard to address in all of these systems, although Chord presents some formal proofs of resilience under failures. I think that the most interesting piece of future work that they note is the "proximity routing", which looks like an attempt to choose "fingers" which are nearby so as to minimize lookup latency. This seems like only one way that you might incorporate topological information into the hash table's decision, but the issue of trying to generate an overlay which respects an underlying topology seems like a good research direction.

Tuesday, October 28, 2008

Resilient Overlay Networks

I think the central insight of this paper is that some things, like multi-path routing and optimizing for different application metrics are not well handled at the IP later. The reason for this is because the Internet is addressed in a flat address space, and so it's generally frowned upon to make assumptions about which traffic is important, or which destinations are being used. Therefore, in order to achieve the performance of the paper, a great deal of routing state would need to be exchanged by BGP, most of which would remain unused.

The authors' solution is to create an overlay network: essential, just another network layer with its own routing protocol which uses the routing provided by an underlying layer to create virtual links between the overlay routers. This design is reminiscent of a lot of other work, like Application Layer Multicast, and maybe even the Ken Birman's Isis (hi Ari...) In fact, routing is even easier with virtual links then physical links, because each router is directly connected to every other router through a virtual link (ie., IP hosts can talk to anyone). Therefore, flooding routing state is just sending N-1 internet packets to the other routers; we don't have to worry much about complicated flooding mechanisms.

One thing I really like about this design is that it optimizes both state and control traffic for endpoints which are in use, while ignoring the rest of them. On an Internet scale, and 50-node subnet is pretty insignificant, and optimizing the same metrics for every 50-node subset would probably be impossible. I do dislike the generally arrogant tone of this paper, but it's interesting that it works.

Their result that at most one intermediate node is sufficient to overcome faults is actually pretty obvious in hindsight; they even have a nice explanation of why in the paper that bears repeating. The underlying problem is that (1) BGP converges slowly since it's DV, and (2) multi homing is kind of broken. Essentially all they do with one backup intermediate router is to make the endpoint's AS's "virtually multihomed." From this angle, you really might get the impression that their whole argument really comes down to fixing BGP, which might not even be that hard; mainly a deployment issue. Nonetheless, an enjoyable read.

Sunday, October 19, 2008

On Power-Law Relationships of the Internet Topology

The basic methodology and contribution of this paper is to examine aggregates from routes collected by BGP over a period of approximately one year. Given this data set, the authors argue that the relevant properties of the internet graph are best described by power-law distributions rather then simple averages. For instance, in this work, they describe three power-law distributions: between router out-degree and router rank, router out-degree and frequency of that degree, and the size of hop-neighborhoods. The authors that their results show a sort of invariance in the basic structure of the internet even though the network changed significantly during the time during which the measurements took place.

This paper seems easy to criticize, both typographically and methodologically. I do want to try and appreciate what the authors are attempting to accomplish; this seems like a very difficult type of paper to write, since no matter what your conclusions and data you will probably get attacked. All that notwithstanding, my main question would be to consider how well this work has aged; the internet has certainly grown by much more then 40% between 1997 and the present, and the authors' conclusions would be considerably more convincing if they held between then and now. It appears from a cursory googling that the authors did publish a sort of follow-up in 2003 where they claim that these laws continued to hold over five years.

While I would be a forgiving reviewer towards the actual data they present, I do take issue with some of the final sections, where the authors wax poetic about "finding order in chaos." They also present some completely uninterpreted results about the power law of eigenvalues. While they cite some work which I'm sure has very interesting interpretations of why we should care about the eigenvalues, this section seems particularly unmotivated.

Thursday, October 16, 2008

TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks

It's important to read this paper in context, so I'll try to provide some of that here as well as a summary of "what's changed", because a lot is different from when this was published in OSDI. In 2002, "sensor networks" were (I believe) widely thought to require a new and different model of network. The idea of "motes strewn from airplanes" which would get distributed on the group was a [somewhat] commonly cited example of how motes would be deployed; after landing, the motes would form a sort of "intelligent cloud" which would crunch the data and report the "answer" to someone outside the network. The goal of allowing non-computer scientists to create a sensor network application is a popular motivation for work. Of course, this is partially hyperbole, but it's also somewhat clear how TAG fits into this picture.

What's changed since 2002 is hard experience in application deployment and software development. Numerous early deployments were hobbled by badly performing hardware and software. Furthermore, contact with various domain specialists all indicated that they did not generally want a summary of the data; they wanted the data. That restricts the realm of on-line, fully distributed networks like TAG, TinyDB, ADMR, etc. to a special subset of applications. For many applications, the two network models most critical are collection (many to one), and dissemination (one to many). The importance of these primitives are clear even in this TAG paper, as collection or tree routing is used to report data to a centralized controller and dissemination to distribute queries through the network.

A major problem with early work like this is that it jumped ahead to design complicated systems like this in-network aggregation system before ever mastering the basics of collection and dissemination. A good example of this is the performance of their multi-hop collection protocol; they cite 45% as the yield; however, given the network size and layout parameters they list, that low yield is more a reflection on poor algorithms and implementation then anything else. Part of this is not their fault, as modern IEEE802.15.4 radios operate at 250kbps which increases potential throughput, and use the more advanced QPSK encoding which significantly reduce the packet error rate compared to the older FSK radios. However, overall this paper overreaches because no service decomposistion existed to allow the authors to focus on the central element of their work (the aggregation); they must consider the entire stack. It's kind of like trying to write a paper about bittorrent, but needing to design IP+Linux to do so.

One thing this paper gets right that is still topical is the use of storage into the network architecture; most motes have significant amounts of non volatile storage (flash) compared to their RAM and radio bandwidth. It often advantageous from an energy standpoint to stream a number of packets sequentially to amortize the fixed cost of waking up a receiver and channel acquisition.

ExOR

Sorry about not reviewing these papers; it was a busy day. These are some of my papers because they really make clever use of opportunistic forwarding to improve efficiency. The real trick is telling who has received a message; as I recall, they do this with a backoff and snooping on the retransmitted packets. Finally, they use a standard routing protocol for final delivery. This is some fine systems work because it demonstrates a nice solution to a clear problem.

Sunday, October 5, 2008

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols

This is a classic example of when you want to write a "tools" paper (they implemented a whole bunch of stuff in ns-2), but in order to get it published you have to actually use the tool for something. Therefore, we'll talk about their evaluation (I'll skip the discussion of how how these four protocols work). I'll confine myself mostly to methodology, since the actually results are ten years old and mostly irrelevant by now.

They choose as their mobility model the infamous "random waypoint" model. The reason I say infamous is that practically nothing in the real world behaves anything like this; it it, however, easy to implement. There's an apocryphal story floating around academia that the one thing that does behave like this are tractors, and that the model was developed for networking between Caterpillers. In any case, it has been copied all together too much. Their parameters don't improve the situation; a maximum speed of 10 m/s! That's about 25mph, or about how fast a tractor might go. This sort of mobility model is especially pervasive in the MANET world; I could cite a bunch of other evaluations which use it...

One bright spot of their evaluation is the discussion of routing overhead, which shows that some protocols are not sensitive to mobility (DSDV-SQ), while others are, but have a very high all-around overhead (TORA).

My view on this paper is that the contribution of radio models to ns-2 is worthy of a publication, but the actual evaluation is poisoned by an almost completely unmotivated choice or random waypoint model. If they really wanted to evaluate the performance of these protocols under mobility, they should have developed several models (random waypoint could be one of them), and compared performance across them. Other models could be an infrastructure model, or ones developed with specific applications in mind. Zebranet [1] is an example of a place where they gave some more thought to the mobility model.

[1] http://www.princeton.edu/~mrm/zebranet.html

A High-Throughput Path Metric for Multi-Hop Wireless Networking

First of all, I would usually refer to this paper as "the ETX paper," and will make a few general comments about this work. One, if it seems surprising that something this simple could be novel as late as 2003, I think a few points are in order. Previous routing protocols for wired network like RIP or OSPF do allow metrics for path computation and use Dijkstra or Bellman-Ford to compute routes (in the case of link state protocols), but they tend to use rather simplistic link metrics like hop count or bandwidth. The most important difference between these link metrics and this work is that the link quality in wireless cannot generally be know beforehand. Thus, the key change is not really the particular metric, but the the fact you need to do on-the-fly estimation of the link quality.

A fundamental difference between wireless and wired network is that the process of routing in wireless can be thought of as a two step process. In the first step, "topology formation," a routing topology is chosen, where decisions are made are which links are to be used, and the second, "routing," actual paths are chosen through the network. What should be noted is that the innovation of ETX is to combine these two steps where the metric implicitly weeds out poor links. In contrast, the previous technique of routing based on hop count does not make this separation; in order to form a hop-count based path, you must first take a position on which links are available to be routed over. This issue is explored more deeply in Alec Woo's thesis; the basic result is that you can't just have a threshold to determine which links are good to route over. We also discuss this issue in [1].

One way this work is over simplistic is its treatment of the estimation problem. Their estimator computes a simple windowed estimate over 10 seconds of the packet reception ratio in both directions, and inverts this to form an expected transmission count. This metric has a number of nice properties, like correlation with energy cost of using the link, and also provides an estimate for an appropriate number of link-layer retransmissions to use. There has been some advance on this front; more current implementations might use an EWMA filter, but there is still the parameter tuning problem of choosing a horizon. Newer link estimators like [2] attempt to add more information to further improve performance.

Their evaluation is lengthy and somewhat tedious at this point; perhaps it was more of a surprise five years ago then it is now that a decent path metric is a good idea.


[1] The Effect of Link Churn on Wireless Routing. Stephen Dawson-Haggerty,
Jorge Ortiz, Xiaofan Fred Jiang and David E. Culler. Technical Report
No. UCB/EECS-2008-109, University of California Berkeley, Computer Science
Division. 2008.
[2] Four-Bit Wireless Link Estimation. Rodrigo Fonseca,Omprakash Gnawali,Kyle Jamieson, and Philip Levis. HotNets 2007

More TCP Exploits

The never-ending stream of protocol and implementation exploits continues... This one seems to be peculiar to connection initiation; there's not really enough information to tell exactly what's going on here, but sounds like it might be implementation specific. Although, if it's really a problem with queued connections at startup, it seems plausible that it's a bit more general then that.

http://securityandthe.net/2008/10/02/more-details-about-tcpip-attack-surface/

Thursday, October 2, 2008

Architecture and Evaluation of an Unplanned 802.11b Mesh Network

This paper covers the basics well, and at the end presents an impressive analysis of the performance of their architecture. Their architecture is relatively simple. Packets are encapsulated in a custom routing frame, and link estimation is performed using period beacon-ack pairs to estimate the probability of DATA-ACK success. CTS/RTS is disabled (this is typical, I believe,) probably to reduce protocol overhead. Each node either routes traffic to the internet if it is connected, or else forms a default route to a gateway to deliver traffic. The use of source routing everywhere is also notable; this is an approach that really began to be introduced with DSR, as they note, and seems to be gaining favor in ad-hoc networks. I think there are a couple of reasons for this: state is local to sending nodes, and so there is never any confusion about where the packet will go. This makes routing loops impossible, and furthermore eliminates recovery in the case where an intermediate node reboots and comes back up with uninitialized routing tables. This sort of local inconsistency can be very problematic for both link-state and distance vector algorithms.

They evaluate their solution along various axis, including node density and various properties of the topology; they also show which links are most important to the routing. One aspect that I thought was particularly interesting was multi-speed nature of the link, thus presenting implications at the network layer. As they point out, it may be more beneficial to take more hops at a higher bit rate then fewer hops at a lower one, and due to the spacing of the different speeds, link-layer losses of up to 50% may still be the "best" link. They are helped in their design by the fact that energy is not a concern; all the routers are plugged into the wall.

Modeling Wireless Links for Transport Protocols

This is an almost ideal journal paper, as it consists mainly of a synthesis of ideas from the literature by experts in the field. The authors examine what is important to model in a wireless link. One reason I like this paper is that the authors focus on a sensitivity analysis concerning what is actually important to simulate in a wireless link. This is a step away from a more EE-style modeling the physical characteristics of the link using fading and noise models. I would argue that the approaches advocated in this paper are not universally accepted; the central tenent is that it is possible to distill out the independent variables controlling transport behavior and model them; these variables are things like losses, delay variation, reordering, and queuing discipline in the router. There is, however, a cottage industry in academia involved in creating physically realistic PHY models; for instance, the TinyOS packet-level simulator tries to model noise by using measured traces to reproduce physically realistic packet successes and failures [1]. The advantage of this approach is that the model can be used to test more then just transport protocols; the entire stack can be tested at once. However, the modeling approach suggested by Gurtov & Floyd is allows more freedom to explore the parameter space since it directly models the variables which impact performance, rather then relying on emergent behavior for a lower level simulation.

Another reason this is a wonderful survey paper is that the authors consider a wide variety of wireless link technologies in order to discover what the differences are. They consider WLANs, GPRS, and satellite links which span a wide spectrum of bandwidth, latencies, and reliabilities. One point I liked was that for each technology and link characteristic (loss, delay, etc), they examined how prevalent the link characteristic is in a technology. Personally, I thought that they spent too much time on some of the characteristics and too little time on the basics of error losses and delay variation. In other words, there didn't seem to be a clear ordering of which is most likely to occur and thus is most important to model. Although it might vary between technologies, I suspect loss and jitter are important in nearly every technology.

[1] HyungJune Lee, Alberto Cerpa, and Philip Levis. "Improving Wireless Simulation Through Noise Modeling." In Proceedings of the Sixth International Conference on Information Processing in Wireless Sensor Networks (IPSN), 2007.