Wednesday, December 10, 2008

More Tools

Always impressed with a good tool, I've just learned about tcptrace; the homepage is here. It can take a tcpdump dump and generate basically any graph or statistic you could possibly want; throughput, data-in-flight, time-sequence number plots, etc. It outputs graphs in xplot format. A word of warning; the debian package version of tcptrace worked fine for me but the debian version of xplot wouldn't accept its output; maybe there's a version mismatch. I'm not actually sure the debian xplot is the same xplot that tcptrace expects. Building xplot from source fixed the problem.

Also of interest is xpl2gpl, which can convert the xplot files to gnuplot files, if that's your thing. It seemed to work okay...

Thursday, November 27, 2008

pathchar

I just had one of those serendipitous google experience where you learn something-- the snippits bar about gmail had a link to a company called Packet Design, which has developed a tool to improve visibility into IP routing protocols; sounds like BGP, IS-IS, EIGRP, and OSPF mostly. Anyways, I always look at the people first, and in this case it looks like both Van Jacobsen and Judy Estrin are on the board; okay, this seems serious. However, in Van Jacobsen's bio it lists several tools he's developed like traceroute, and one I had never heard of: pathchar. Maybe this is well known, but no one brought it up in class the other day when we were all talking about estimating hop-by-hop latency. Several [very old] binaries are available at ftp://ftp.ee.lbl.gov/pathchar/; the linux one sort of seemed to work for me. What's very interesting are the presentation slides explaining the challenges of estimating link latencies; he notes (obviously, I suppose) that queuing delay is what makes this hard, although you can easily filter it on a single hop.

Thursday, November 20, 2008

Scalable Application Layer Multicast

The authors in this paper present their design of a hierarchal application-layer multicast protocol they term NICE. In this context, an "application-layer" implementation of a network protocol means that all protocol functionality is present at the edges of the network; in this case only in the actual users of the data, in contrast to a "network-layer" network protocol (what is standard) where the forwarding and routing is distributed across the network fabric. What this means in practice is that application layer overlays must be constructed out of point-to-point IP links rather then any more exotic delivery model since IP links are what is available.

It seems like if one were attempting to design a "scalable application-layer multicast, it seems like heirarchy is the natural approach since it works so well in making all things scalable; what is novel here is a well-stated algorithm for dealing with joins and leaves; as well as failures and partitions. In this regard, it reads similarly to the Chord paper in that the challenge really seems to be dealing with the corner cases well.

This actually seems like a pretty reasonable scheme. I suspect a lot of applications could or do have this sort scheme built in them since it lets them scale nicely.

Monday, November 17, 2008

An Architecture for Internet Data Transfer

This must be a very good paper, because I was really getting ready to hate it, and by the end it had pretty much sold me that this was a good idea. I think the main lesson here is just how deeply knowing about BSD sockets can sink into your psyche...

Anyhow, the basic idea here is to apply the separation of control and data planes to internet transport. Essentially, a low-latency protocol can be used for signaling while a whole variety of different bulk transfer protocols can be used to move the data. I think there are a couple of interesting pitfalls this work could have taken, and I think they avoided most of them. First, paper isn't really about inventing new transport protocols, although they did and it's nice that they work. Second, I think it would be possible to do this work without producing anything useful. What really sold it to me was the example of integrating a USB key or floppy into the network for large transfers; this is actually a useful artifact because there are definite times when you wish you could just data onto a key but its locked in an application which can't dump data cleanly except to a socket.

By taking a well defined problem the authors were also able to integrate some of the good ideas from content addressable networks to a very relatable example. This paper should definitly remain on the list.

A Delay-Tolerant Network Architecture for Challenged Internets

This paper begins with the statement mostly-disconnected networks cannot participate directly in most internet protocols because those protocols were not designed for this regime. The paper then proposes a dramatic architectural shift which leaves no layer untouched. The architectural idea is that each network in a challenged regime has already been optimized with naming and configuration mechanisms designed for the particular space developed. Thus [it is claimed] what is necessary is a new layer of glue consisting of a new generalized naming layer and set of gateways to bridge between these challenged networks like military ad-hoc networks or sensor networks and other, more reliable networks like the internet. This paper proposes a lot in eight pages.

Without going further, I think it's worth bringing up what I think is the central architectural fallacy of this paper: a new naming layer is necessary to support these protocols. The reason this is a fallacy is that we already have such a layer which is incredibly general and imposes none of the semantics which this paper seems to view as problematical. It's called IP. While I agree that below IP the link layers differ, within IP routing differs, and above IP transport cannot remain the same, the fact is that IP presents a naming mechanism which seems no worse then what is proposed here. We already have a name for "DTN Gateways": routers.

The paper does have a number of bright spots. They do consider SMTP as valuable prior work, although noting that static mail relays will never work in a mobile environment. It also correctly discards application layer gateways as blemishes on a network architecture as they spread application code throughout the network.

If you examine the set of tasks presented in section 4 like path selection, custody transfer, and security, the issues sound no different then what would be expected for bringing the internet to a new network layer, and so I would concluded that the paper doesn't really succeed in making the case for a wholesale architectural shift; as is noted in the conclusion, an evolutionary approach of better link technologies and more adaptive internet protocols seem likely to solve the problem this paper addresses.

Friday, November 14, 2008

DNS Performance and the Effectiveness of Caching

This is a relatively modern study of how often DNS queries can be serviced from a cache based on snooping packets from the MIT Computer Science Laboratory, and from traces from KAST. Their most interesting result is that they claim reducing TTLs to as little as a few hundred seconds has almost no effect on hit rates. They further claim that sharing a cache between more then a few clients is not beneficial due to the long-tailed nature of DNS requests. These conclusions seem reasonable for the small number of users in each of their studies; they seem to claim that they will hold for larger groups as well because of the scaling properties of the hit rate in Fig 12. This figure is making a statement about the overlap in the set of domain names requested by a set of users. The result that NS records are well cached and so the actual request traffic has a nice pithy quality I admire; furthermore it makes sense and bodes well for things like DYNDns.

This paper has an interesting result, but other then the result I think reading the entire paper might be overkill for the class. I would rather we spend some time on security and discuss DNSSec, since our reading list has almost no papers considering security on it and I think that paper would spur an interesting discussion.

Development of the Domain Name System

This paper presents an essential internet architectural component: a naming service which translates between internet ascii names and other information; most commonly, the internet address of the server with a given name. One interesting design decision is that DNS is actually somewhat more general then I think we know from our daily usage, with a notion of a record types and supporting many different types of lookups. This functionality doesn't seem to be used all that much because DNS servers typically are run by professional hosting staff these day who often have somewhat ossified views on what is possible. There are some new record types which have emerged with some importance; MX records are often used to filter spam by allowing only a few mail servers per domain.

There have been some enhancements made to DNS since it was first developed, although the overall structure does not seem to have changed much. There are numerous security problems with the caching architecture of the DNS system as poisoning a cache near a large of users can easily redirect their requests to and adversary. There are a large number of RFCs defining DNSSec, a set of cryptographic extensions to DNS which signs all data to prevent a whole class of attacks which are currently possible if DNS servers are compromised. DynDNS is another set of extensions which allow a mobile host to update the A record for them on the fly, to allow them to migrate between subnets and allow lookups to resolve correctly. Of course, if this is being used, caching becomes much less effective because caching for more the a few minutes is very problematic. This (I believe) has received some real traction and is implemented in modern versions of BIND.

One thing which is interesting to me is the question of how far down into users were originally envisioned to have control. To put it differently, it seems desirable to give users control over a portion of a globally accessible namespace for storing their own data; I might want to have a DNS server for my sub domain which allows me to store my public key and other things. However, as DNS has evolved, this type of usage has not emerged. I wonder if there is room for another naming service a more "grass roots" oriented service model.

Thursday, November 13, 2008

End-to-End Internet Packet Dynamics

First of all, I really like this study; the writing is straightforward, almost matter-of-fact and the paper is very clear about what its goals are, what the the contributions are, and where the data or analysis is not complete. This is a "pure" measurement paper in that it treats the results as interesting in and of themselves and does not lower itself by implementing something based on the results. This isn't sarcasm; I think this is actually a good thing in a measurement paper.

The analysis starts with a good data set; Paxson collects traces from many 100kb TCP sessions between 39 measurement points, geographically distributed across North America and Europe. To me, the most interesting results was the internet prevalence of packet reordering and damage in the traffic he observed. It wasn't possible to verify the checksum on the whole packets, and there may have been too much traffic to log all the packets, but it seems like this could easily be solved either by computing the checksum on the fly as packets come in, or else actually storing the whole trace; it may not have been possible in 1995 but would be no problem today.

XTrace

I like xtrace; it provides some really good insights about what is actually happening during complex data center and internet transactions across many machines and services. What is critically important to its ability to do this is its acceptance of the need for a limited amount of application modification to enable the detailed tracing xtrace performs. Other studies which attempt to infer causality from snooped data traffic are significantly more limited in the conclusions they can draw because they cannot incorporate a notion of application-level semantics in their analyses. XTrace's notion of a persistent transaction ID allows the application being traced to define the operation which is meaningful. It also persists across layers, so that the result is a trace showing all operations initiated by a single transaction anywhere in the system. The authors present several compelling examples of where the tool was used to diagnose performance problems and bugs.

I think the main value of this work as a window into the operation of complicated distributed systems which are otherwise appallingly opaque. This system may be only the first step into greater visibility, because for a sufficiently large system, one needs to ensure that there is a mechanism for usefully interpreting the resulting causal graphs which may be very large.

Sorry

For all you avid readers, sorry about the delay. I'm about to introduce some message reordering by posting in reverse order the papers which I missed while off at SenSys.

Thursday, October 30, 2008

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

In what has got to be the most-cited DHT paper, the authors develop a protocol providing the "put" and "lookup" primitives in the wide area and in a completely distributed manor. The basic structure is of a "ring" of keys, which is just to say that keys are chosen from Z/2^nZ. What is very nice about their design is that correctness depends only on a single invariant: the fact that the "next" pointer each chord node maintains is correct. This allows them to analyze some of the basic properities of the system without some additional complexity; for instance, the optimization for log-time lookups (the finger table) is not required for the lookups to work, as you can always reach any key in linear time by following the "next" pointers around the ring. They also work out in some detail the join, leave, repair, and stabilization protocols which allows the DHT ring to remain stable in the face of various types of event.

I don't think there's a question that DHTs are a nice idea with some good applications; the MIT group responsible for this paper went on to explore some more or less obvious uses of them such as Ivy, which is a DHT-based file system (actually, there are many ivy's.) In the first paper the authors contrast their flat key space and completely distributed algorithm with more "traditional" distributed system which function by imposing hierarchy on the collection of machines; they cite FastTrack and DNS. While there is a benefit to being distributed, it also seems to suffer from a lack of "authoritative answers" that DNS can provide; the system there is fundamentally hierarchical which makes it perhaps not the best example. The completely distributed nature also make caching more of an issue. For instance, in a hierarchical system with high fanout but relatively low tree depth, if the nodes closer to the leaves of the tree cache results they can deflect a good deal of traffic from traveling up the tree. This is the equivalent of your ISP's DNS server caching recent DNS queries. Since all traffic flows through that one server, its can have a high hit rate. In a distributed system like Chord, each request would require a lookup traversing mostly different sets of fingers before reaching the ultimate destination.

Looking up Data in P2P Systems

The authors of this (the same authors who came up with Chord, and perhaps also (Tapestry) present a bit of a "state of the field" in distributed hash tables. Although a short paper in CACM, they actually go into some detail about four different DHT algorithms: Chord, Tapestry, Pastry, and Kademlia. It's an interesting read to hear about the differences, such as the distance functions, where they identify the most important properties as symmetry and unidirectionality. They also look briefly at other lookup ideas, like tree routing and CAN (not really unrelated.)

They note that dealing with concurrent and asynchronous failures and joins is a challenge that is hard to address in all of these systems, although Chord presents some formal proofs of resilience under failures. I think that the most interesting piece of future work that they note is the "proximity routing", which looks like an attempt to choose "fingers" which are nearby so as to minimize lookup latency. This seems like only one way that you might incorporate topological information into the hash table's decision, but the issue of trying to generate an overlay which respects an underlying topology seems like a good research direction.

Tuesday, October 28, 2008

Resilient Overlay Networks

I think the central insight of this paper is that some things, like multi-path routing and optimizing for different application metrics are not well handled at the IP later. The reason for this is because the Internet is addressed in a flat address space, and so it's generally frowned upon to make assumptions about which traffic is important, or which destinations are being used. Therefore, in order to achieve the performance of the paper, a great deal of routing state would need to be exchanged by BGP, most of which would remain unused.

The authors' solution is to create an overlay network: essential, just another network layer with its own routing protocol which uses the routing provided by an underlying layer to create virtual links between the overlay routers. This design is reminiscent of a lot of other work, like Application Layer Multicast, and maybe even the Ken Birman's Isis (hi Ari...) In fact, routing is even easier with virtual links then physical links, because each router is directly connected to every other router through a virtual link (ie., IP hosts can talk to anyone). Therefore, flooding routing state is just sending N-1 internet packets to the other routers; we don't have to worry much about complicated flooding mechanisms.

One thing I really like about this design is that it optimizes both state and control traffic for endpoints which are in use, while ignoring the rest of them. On an Internet scale, and 50-node subnet is pretty insignificant, and optimizing the same metrics for every 50-node subset would probably be impossible. I do dislike the generally arrogant tone of this paper, but it's interesting that it works.

Their result that at most one intermediate node is sufficient to overcome faults is actually pretty obvious in hindsight; they even have a nice explanation of why in the paper that bears repeating. The underlying problem is that (1) BGP converges slowly since it's DV, and (2) multi homing is kind of broken. Essentially all they do with one backup intermediate router is to make the endpoint's AS's "virtually multihomed." From this angle, you really might get the impression that their whole argument really comes down to fixing BGP, which might not even be that hard; mainly a deployment issue. Nonetheless, an enjoyable read.

Sunday, October 19, 2008

On Power-Law Relationships of the Internet Topology

The basic methodology and contribution of this paper is to examine aggregates from routes collected by BGP over a period of approximately one year. Given this data set, the authors argue that the relevant properties of the internet graph are best described by power-law distributions rather then simple averages. For instance, in this work, they describe three power-law distributions: between router out-degree and router rank, router out-degree and frequency of that degree, and the size of hop-neighborhoods. The authors that their results show a sort of invariance in the basic structure of the internet even though the network changed significantly during the time during which the measurements took place.

This paper seems easy to criticize, both typographically and methodologically. I do want to try and appreciate what the authors are attempting to accomplish; this seems like a very difficult type of paper to write, since no matter what your conclusions and data you will probably get attacked. All that notwithstanding, my main question would be to consider how well this work has aged; the internet has certainly grown by much more then 40% between 1997 and the present, and the authors' conclusions would be considerably more convincing if they held between then and now. It appears from a cursory googling that the authors did publish a sort of follow-up in 2003 where they claim that these laws continued to hold over five years.

While I would be a forgiving reviewer towards the actual data they present, I do take issue with some of the final sections, where the authors wax poetic about "finding order in chaos." They also present some completely uninterpreted results about the power law of eigenvalues. While they cite some work which I'm sure has very interesting interpretations of why we should care about the eigenvalues, this section seems particularly unmotivated.

Thursday, October 16, 2008

TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks

It's important to read this paper in context, so I'll try to provide some of that here as well as a summary of "what's changed", because a lot is different from when this was published in OSDI. In 2002, "sensor networks" were (I believe) widely thought to require a new and different model of network. The idea of "motes strewn from airplanes" which would get distributed on the group was a [somewhat] commonly cited example of how motes would be deployed; after landing, the motes would form a sort of "intelligent cloud" which would crunch the data and report the "answer" to someone outside the network. The goal of allowing non-computer scientists to create a sensor network application is a popular motivation for work. Of course, this is partially hyperbole, but it's also somewhat clear how TAG fits into this picture.

What's changed since 2002 is hard experience in application deployment and software development. Numerous early deployments were hobbled by badly performing hardware and software. Furthermore, contact with various domain specialists all indicated that they did not generally want a summary of the data; they wanted the data. That restricts the realm of on-line, fully distributed networks like TAG, TinyDB, ADMR, etc. to a special subset of applications. For many applications, the two network models most critical are collection (many to one), and dissemination (one to many). The importance of these primitives are clear even in this TAG paper, as collection or tree routing is used to report data to a centralized controller and dissemination to distribute queries through the network.

A major problem with early work like this is that it jumped ahead to design complicated systems like this in-network aggregation system before ever mastering the basics of collection and dissemination. A good example of this is the performance of their multi-hop collection protocol; they cite 45% as the yield; however, given the network size and layout parameters they list, that low yield is more a reflection on poor algorithms and implementation then anything else. Part of this is not their fault, as modern IEEE802.15.4 radios operate at 250kbps which increases potential throughput, and use the more advanced QPSK encoding which significantly reduce the packet error rate compared to the older FSK radios. However, overall this paper overreaches because no service decomposistion existed to allow the authors to focus on the central element of their work (the aggregation); they must consider the entire stack. It's kind of like trying to write a paper about bittorrent, but needing to design IP+Linux to do so.

One thing this paper gets right that is still topical is the use of storage into the network architecture; most motes have significant amounts of non volatile storage (flash) compared to their RAM and radio bandwidth. It often advantageous from an energy standpoint to stream a number of packets sequentially to amortize the fixed cost of waking up a receiver and channel acquisition.

ExOR

Sorry about not reviewing these papers; it was a busy day. These are some of my papers because they really make clever use of opportunistic forwarding to improve efficiency. The real trick is telling who has received a message; as I recall, they do this with a backoff and snooping on the retransmitted packets. Finally, they use a standard routing protocol for final delivery. This is some fine systems work because it demonstrates a nice solution to a clear problem.

Sunday, October 5, 2008

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols

This is a classic example of when you want to write a "tools" paper (they implemented a whole bunch of stuff in ns-2), but in order to get it published you have to actually use the tool for something. Therefore, we'll talk about their evaluation (I'll skip the discussion of how how these four protocols work). I'll confine myself mostly to methodology, since the actually results are ten years old and mostly irrelevant by now.

They choose as their mobility model the infamous "random waypoint" model. The reason I say infamous is that practically nothing in the real world behaves anything like this; it it, however, easy to implement. There's an apocryphal story floating around academia that the one thing that does behave like this are tractors, and that the model was developed for networking between Caterpillers. In any case, it has been copied all together too much. Their parameters don't improve the situation; a maximum speed of 10 m/s! That's about 25mph, or about how fast a tractor might go. This sort of mobility model is especially pervasive in the MANET world; I could cite a bunch of other evaluations which use it...

One bright spot of their evaluation is the discussion of routing overhead, which shows that some protocols are not sensitive to mobility (DSDV-SQ), while others are, but have a very high all-around overhead (TORA).

My view on this paper is that the contribution of radio models to ns-2 is worthy of a publication, but the actual evaluation is poisoned by an almost completely unmotivated choice or random waypoint model. If they really wanted to evaluate the performance of these protocols under mobility, they should have developed several models (random waypoint could be one of them), and compared performance across them. Other models could be an infrastructure model, or ones developed with specific applications in mind. Zebranet [1] is an example of a place where they gave some more thought to the mobility model.

[1] http://www.princeton.edu/~mrm/zebranet.html

A High-Throughput Path Metric for Multi-Hop Wireless Networking

First of all, I would usually refer to this paper as "the ETX paper," and will make a few general comments about this work. One, if it seems surprising that something this simple could be novel as late as 2003, I think a few points are in order. Previous routing protocols for wired network like RIP or OSPF do allow metrics for path computation and use Dijkstra or Bellman-Ford to compute routes (in the case of link state protocols), but they tend to use rather simplistic link metrics like hop count or bandwidth. The most important difference between these link metrics and this work is that the link quality in wireless cannot generally be know beforehand. Thus, the key change is not really the particular metric, but the the fact you need to do on-the-fly estimation of the link quality.

A fundamental difference between wireless and wired network is that the process of routing in wireless can be thought of as a two step process. In the first step, "topology formation," a routing topology is chosen, where decisions are made are which links are to be used, and the second, "routing," actual paths are chosen through the network. What should be noted is that the innovation of ETX is to combine these two steps where the metric implicitly weeds out poor links. In contrast, the previous technique of routing based on hop count does not make this separation; in order to form a hop-count based path, you must first take a position on which links are available to be routed over. This issue is explored more deeply in Alec Woo's thesis; the basic result is that you can't just have a threshold to determine which links are good to route over. We also discuss this issue in [1].

One way this work is over simplistic is its treatment of the estimation problem. Their estimator computes a simple windowed estimate over 10 seconds of the packet reception ratio in both directions, and inverts this to form an expected transmission count. This metric has a number of nice properties, like correlation with energy cost of using the link, and also provides an estimate for an appropriate number of link-layer retransmissions to use. There has been some advance on this front; more current implementations might use an EWMA filter, but there is still the parameter tuning problem of choosing a horizon. Newer link estimators like [2] attempt to add more information to further improve performance.

Their evaluation is lengthy and somewhat tedious at this point; perhaps it was more of a surprise five years ago then it is now that a decent path metric is a good idea.


[1] The Effect of Link Churn on Wireless Routing. Stephen Dawson-Haggerty,
Jorge Ortiz, Xiaofan Fred Jiang and David E. Culler. Technical Report
No. UCB/EECS-2008-109, University of California Berkeley, Computer Science
Division. 2008.
[2] Four-Bit Wireless Link Estimation. Rodrigo Fonseca,Omprakash Gnawali,Kyle Jamieson, and Philip Levis. HotNets 2007

More TCP Exploits

The never-ending stream of protocol and implementation exploits continues... This one seems to be peculiar to connection initiation; there's not really enough information to tell exactly what's going on here, but sounds like it might be implementation specific. Although, if it's really a problem with queued connections at startup, it seems plausible that it's a bit more general then that.

http://securityandthe.net/2008/10/02/more-details-about-tcpip-attack-surface/

Thursday, October 2, 2008

Architecture and Evaluation of an Unplanned 802.11b Mesh Network

This paper covers the basics well, and at the end presents an impressive analysis of the performance of their architecture. Their architecture is relatively simple. Packets are encapsulated in a custom routing frame, and link estimation is performed using period beacon-ack pairs to estimate the probability of DATA-ACK success. CTS/RTS is disabled (this is typical, I believe,) probably to reduce protocol overhead. Each node either routes traffic to the internet if it is connected, or else forms a default route to a gateway to deliver traffic. The use of source routing everywhere is also notable; this is an approach that really began to be introduced with DSR, as they note, and seems to be gaining favor in ad-hoc networks. I think there are a couple of reasons for this: state is local to sending nodes, and so there is never any confusion about where the packet will go. This makes routing loops impossible, and furthermore eliminates recovery in the case where an intermediate node reboots and comes back up with uninitialized routing tables. This sort of local inconsistency can be very problematic for both link-state and distance vector algorithms.

They evaluate their solution along various axis, including node density and various properties of the topology; they also show which links are most important to the routing. One aspect that I thought was particularly interesting was multi-speed nature of the link, thus presenting implications at the network layer. As they point out, it may be more beneficial to take more hops at a higher bit rate then fewer hops at a lower one, and due to the spacing of the different speeds, link-layer losses of up to 50% may still be the "best" link. They are helped in their design by the fact that energy is not a concern; all the routers are plugged into the wall.

Modeling Wireless Links for Transport Protocols

This is an almost ideal journal paper, as it consists mainly of a synthesis of ideas from the literature by experts in the field. The authors examine what is important to model in a wireless link. One reason I like this paper is that the authors focus on a sensitivity analysis concerning what is actually important to simulate in a wireless link. This is a step away from a more EE-style modeling the physical characteristics of the link using fading and noise models. I would argue that the approaches advocated in this paper are not universally accepted; the central tenent is that it is possible to distill out the independent variables controlling transport behavior and model them; these variables are things like losses, delay variation, reordering, and queuing discipline in the router. There is, however, a cottage industry in academia involved in creating physically realistic PHY models; for instance, the TinyOS packet-level simulator tries to model noise by using measured traces to reproduce physically realistic packet successes and failures [1]. The advantage of this approach is that the model can be used to test more then just transport protocols; the entire stack can be tested at once. However, the modeling approach suggested by Gurtov & Floyd is allows more freedom to explore the parameter space since it directly models the variables which impact performance, rather then relying on emergent behavior for a lower level simulation.

Another reason this is a wonderful survey paper is that the authors consider a wide variety of wireless link technologies in order to discover what the differences are. They consider WLANs, GPRS, and satellite links which span a wide spectrum of bandwidth, latencies, and reliabilities. One point I liked was that for each technology and link characteristic (loss, delay, etc), they examined how prevalent the link characteristic is in a technology. Personally, I thought that they spent too much time on some of the characteristics and too little time on the basics of error losses and delay variation. In other words, there didn't seem to be a clear ordering of which is most likely to occur and thus is most important to model. Although it might vary between technologies, I suspect loss and jitter are important in nearly every technology.

[1] HyungJune Lee, Alberto Cerpa, and Philip Levis. "Improving Wireless Simulation Through Noise Modeling." In Proceedings of the Sixth International Conference on Information Processing in Wireless Sensor Networks (IPSN), 2007.

Tuesday, September 30, 2008

A Comparison of Mechanisms for Improving TCP Performance over Wireless Links

"Wireless TCP" is probably a popular prelim question; it also reappears from time to time in conferences; I've thought about working on it myself. The basic problem is at least clearly stated: TCP believes that all packet losses are caused by congestion, but in wireless this is likely to be incorrect. Thus, it does not perform very well over wireless links. The result of this paper is "yes, but local repair fixes the problem."

The authors evaluate a number of different possible solutions; local retransmission and a series of end-to-end modifications. Intuitively, we should be predisposed to local repair for several reasons: since it is closest to the link, it should have the best information about link quality, and should be able to react the fastest to loss. Split connections is a rather heavy weight solution to a not-that-deep problem, and furthermore will use significantly more network resources. SACKs are a worthy modification to TCP, but are best thought of as solving the bursty loss problem, and helping throughput on high bandwidth-delay links.

What this problem essentially comes down to is that, if a link as a Packet Reception Rate of "p", then with "r" retransmissions, the probability of packet delivery when using link-level acknowledgments is 1-((1-p)^r). What this says is that we can make a poor link look like a good link by estimating the value of p on the fly, and tuning r so as to guarantee a certain level of reliability. Of course, this can be complicated by bursty losses, but the intuition is correct.

The results of the paper bare out this intuition; local repair works as well as anything, and end-to-end protocols are hurt by the fact that the controllers are far from where losses are taking place. Split also works almost as well as local, but there is no gain from just running the local protocol directly on the TCP packets.

The upshot of all of this is that when the original internet papers say that the maximum drop rate of a link should be 1% TCP and IP headers which are either destined for an endpoint or else apply to each hop (hop-by-hop options).

MACAW: A Media Access Protocol for Wireless Lans's

The authors present their design for a media access control protocol to a shared 256kbps channel. Their overriding design consideration is fairness, and not throughput which is a decisions which appears at numerous points in the paper. They begin with a typical RTS-CTS-DATA scheme very similar to CSMA/CA and then consider a number of modifications to it to address fairness; the sender can contend for the receiver using RRTS, the data can use link-level recovery with ACKs, or the sender can acknowledge the CTS message with a DS message before sending the payload.

The design of a workable MAC is somewhat complicated, and its design will effect power consumption, throughput, latency, and reliability. Of these axis, the authors did a reasonable job evaluating throughput; however the others are not well treated (although low latency can be a function of fair sharing between senders). My main objection to their MAC is the high protocol overhead in the RTS-CTS-DS-DATA-ACK exchange, which occurs even the absence of channel contention. What we have found in our own research is that the authors are exactly right about local repair; the lossy nature of these links makes it inadvisable to just send a packet and assume it got there. However, our lesson has been that the additional packets involved in more complex coordination schemes result in a noisier channel which can be an important effect with many senders. Furthermore, if the data packets are short, the data is effectively piggybacked on the RTS; if it is received at all then no further transmissions are necessary and power consumption is reduced because of fewer sends and receives.

Several things about this paper means that our design lessons are not directly applicable. For one, they use significantly different radio technology with a "near-field" PHY and short range. Second, their workload model is significantly different (they cite a data rate at one point of 64 packets per second). Finally, all communication is to central collection points. Given these parameters, I probably would have tried to reduce contention by designing a protocol for assigning different channels to nearby receivers to reduce cross-cell interference (this should solve the RRTS problem) and evaluated weather or not the full scheme they propose is necessary in practice.

This brings us to a final criticism of the paper, which is its evaluation methodology. MAC protocols have been shown to be very sensitive to workload and environmental factors; if a protocol is designed for a specific workload and environment, it would be significantly more convincing to have implementation results for that environment then packet-level simulation. Modeling RF communication is notoriously difficult and there is still active work in conferences improving the simulation in ns-2 and TOSSIM. The implementation results, even if simplistic, are an important sanity check.

Tuesday, September 23, 2008

Scaling Internet Routers Using Optics

McKeown again presents a router architecture, except this time using optics. He begins with the claim that the centralized crossbar and scheduler approach from his previous paper does not scale; this is believable because of the n^2 complexity of the matching problem necessary to efficiently switch packets, and also an argument that multistage systems lead to unpredictable latencies.

He presents his new queuing design, where multiple sets of input queues are connected to output queues cyclically; I think the goal there is to reduce the non-determinism so that the switch can run faster with more predictable latency. He also considers power to be the limiting factor of how much equipment can be stuffed into a rack.

I find it difficult to tell what the magnitude of this paper's contribution is; it seems like it could be either a seminal work laying out a new architecture, or a pie-in-the-sky paper which could never be implemented.

A Fast Switched Backplane for a Gigabit Switched Router

McKeown presents a series of design decisions to allow high throughput between different ports on a network switch. He points to trends in router design towards more hardware, more parallelism, and fewer shared buses to reduce contention, and so argues that a switched backplane will be best because it reduces contention. His design consists of a star topology where each line card is connected to a single switch. The switch can connect pairs of cards, and so the challenge is to quickly form a matching between enqueued messages and receive buffers which minimizes latency and is fair. The algorithm he proposes is called iSLIP, and consists of multiple rounds of proposals after which the switch can choose a conflict-free match.

He is especially interested in reducing block; HOL (head of line) blocking is particularly problematic. To solve that, he introduces the VOQ, or virtual output queue where each card has multiple output queues, one for each other line card. In this scheme, a packet can never become blocked behind a packed destined for a different port. He also notes that a combination of queuing discipline (in the sense of more VOQs + priorities) has the same effect as more speedup (running the switch at a multiple of the line card speed.

This paper outlines the full design of a small switch; I thought it was an interesting read. The techniques he mentioned are probably still valid today, since a 16-way 2.4 Ghz switch is still a reasonable piece of hardware. It was light on details, and parts were confusing; the frequent note that the Cisco 12000 uses some of the techniques didn't actually contribute to the paper, because he does not go into detail other then the general design of that system.

Wednesday, September 17, 2008

Supporting Real-Time Applications

This second Clark, Shenker, and Zhang paper presents a queueing algorithm and architecture for differentiated services. The conceit is separating traffic into two classes, one with real-time requirements and one with more flexible latency requirements. To do so, they allow the network to make service commitments, essentially providing admission control and resource reservations. They also propose a new queuing discipline, FIFO+ to address sharing between flows.

This work seems incomplete, at best, to me. The authors mention off-hand that a pricing model will be essential to provide proper incentives for the system to work correctly, yet there is no discussion of how this would fit into the rest of the scheme. Even Shenker admits in his later paper that this work is particularly unmotivated; one does not get a sense that the complexity introduced here is necessary. Furthermore, the evaluation is lacking, given the incomplete architecture presented. The largest contribution may be the paper's identification of "predicted service" applications; it seems that all internet applications may enter this class by attempting to compute the network capacity available to them on the fly.

Fundamental Design Issues for the Future Internet

This 1995 paper by Scott Shenker takes a step back from the networking research of the day and tries to predict what architectural changes will be necessary for the Internet to scale into the next decade. At that time, the internet had about 3 million hosts and tens of thousands of web sites; its transition from academic curio to a piece of public infrastructure was well underway. The challenge he focuses on is Quality of Service (QoS) guarantees, with the driving applications of streaming audio and video. The architectural question behind these demands is the questions of weather or not new service models are necessary for these applications to succeed. While IP provides best-effort delivery, real-time applications might benefit from bounded latency. The solutions he explores are admission control and fair queuing within the network core.

He considers the alternatives and problems with a differentiated services model: the largest problem is fairness, in the sense that end users would have no incentive to request anything less then the best service, which would defeat the point of the service model. The only solution he proposes is the only one that probably has any chance of working: charging different service classes different amounts for their utilization. However, this paper does not seem to seriously believe this will emerge as an evolution of the internet. Shenker also seriously considers maintaining the existing best-effort service model while over provisioning the network to allow latency and bandwidth sensitive applications to perform well; footnote 12 where he contends that if "bandwidth does become extremely inexpensive, some of our design decisions will be altered. It might be preferable to merely add bandwidth rather then extend the service model." is prescient.

With the benefit of hindsight, it appears that internet architects may have overestimated the cost and difficulty of modifying the IP model; even the roughly contemporaneous IPv6 transition has yet to have a major impact. The basic conservatism of network operators and the choice between new, untested technology that might improve utilization versus a lot more of the old technology (more pipes) has so far been answered in favor of the well-understood solution of adding more bandwidth; moreover, with good results.

To argue that the only way the internet has adapted to the demands of realtime traffic in the past ten years would be simplistic. The emergence of traffic-shaping middle boxes near endpoints to effectively prioritize traffic based on deep, stateful packet inspection is one development; another is the proliferation of content delivery overlay networks which geographically distribute latency-sensitive content to put it closer to consumers is another. Both of these have apparently been easier to do then modify the underlying network model.

Monday, September 15, 2008

XCP

XCP is designed to deal with links with very high bandwidth-delay products. The authors claim that TCP will inevitably become unstable when run over these links; their claim is based on the fact that once the feedback period is too long, TCP will exhibit oscillatory behavior as it increases its congestion window, only to be forced to dramatically decrease it once the link becomes congested; however, the feedback loop is so long that it will have already injected a good deal of data when it receives the control signal. They also argue that slow start is too slow for these links.

As a solution, the authors propose to separate utilization and fairness policy goals, and to provide them with different policies: MIMD and AIMD. Multiple-increase multiple-decrease allow the protocol to quickly grow the amount of bandwidth a connection uses, while the multiplicative decrease still allows quick response to congestion. Further, by adding the congestion window to the transmitted data, senders and receivers gain a better information about the congestion state of the network; effectively extended the one bit of feedback other algorithms use.

One great point of the paper that the authors emphasize is the protocol's TCP-friendliness. I think this is essential for modern transport protocols; no one would risk deploying a new protocol if they did not have at least a reasonably well-founded believe that it would not clobber existing streams. Much of the rest of the paper is the necessary boilerplate to show that the protocol achieves its design goals. However, it's not really clear that the need for this protocol in the wide area has emerged since it was presented in 2002. While there certainly high latency links, often the TCP sessions running over them are just one in a sea of many others. If this is uses anywhere, I would expect it to be in special computing facilities like data centers, supercomputers, and potentially between installations like the various national supercomputer centers.

Sunday, September 14, 2008

Random Early Detection

The basic observation this paper makes is that, since dropping packets is interpreted by many transport protocols to be a sign of congestion, dropping packets can be used to request that endpoints communicating through the router reduce their traffic. The goals of RED are twofold; one is to maintain short queues in the router so as to minimize latency, and the other is to maintain a fair allocation of bandwidth between competing flows. The protocol uses relatively simple estimators to approximate the running average of the queue depth and to drop packets at a rate which increases with the queue length. As the length of the queue increases, RED will drop proportionally more of the incoming request stream to quench more senders. Since the drops are approximately uniformly distributed in time, flows using more of the router's capacity should experience more dropped packets.

RED different from FQ in several important respects. Clearly, RED is stateless with respect to individual conversations, a significant advantage. RED also explicitly extends the congestion feedback loop to include the senders by making the assumption that they will respond to dropped or marked packets by reducing their rate. As the authors mention, they are considering networks where most of the traffic is TCP streams (or DECbit). However, unlike the authors of FQ, they do not evaluate the effect of "firehose" senders being present on the network; it appears that this scheme would fail in those cases because although packets are dropped randomly, the altruistic senders would reduce their rate in response to this feedback, which would leave the greedy ones a greater share of the capacity. It is a useful result that cooperating senders eliminates the need for per-connection state in such as simple way.

This paper definitely has the feel of the journal paper that it is; a conference never would have allowed the authors to go on for so long about relatively basic math. Pedagogically speaking, though, I thought it was pretty interesting to see the problem mapped out in such detail; although the idea is very simple the authors did a good job of examining some of the premises underlying their work; in particular, it seems like you could really get in a hole with parameter selection where you have only rules of thumb about the actual values. They attempt to quantify the tradeoffs of each setting; with any dynamic system, I suspect the devil is in the details and so it is good to see it worked out. It strikes me that these past few papers we have read on queuing discipline have been a bit more rigorous then your run-of-the-mill cs paper; for me, this is a reason to keep them.

Wednesday, September 10, 2008

Core-Stateless Fair Queueing (CSFQ)

If the biggest objection to FQ is the per-flow state, the authors show how that state can be pushed to the edges of the network to allow core devices to function on individual packets while maintaining the nice properties of fair queueing. The actual exposition of their algorithm only takes a couple of pages, and the rest of the paper is an evaluation of the architecture and performance. In CSFQ, all per-flow state is maintained and ingress routers, which tag packets with a rate representing the rate of the flow. The use an exponential filter to estimate the arrival rate of packets for a particular flow which keeps the flow state to a few state variables. Write the fair rate into an MPLS-style tag at the head of the packet. What seems like it could be a crucial part of the algorithm is rewriting the rate estimates in the core, by reestimating the "fair rate" downward when packets are dropped.

The authors' simulation seems to show CSFQ working about as well as DRR (Deficit Route Robin), which seems to be the benchmark here since it has per-flow queue dicipline. The advantage of CSFQ is the careful state management. The other algorithms really do not perform well at all, for the same reasons the authors argued in Analysis and Simulation...

The authors mentions MPLS as one example of existing labeling and classification at the edge; there's a more recent modification (MPLS-TE for Traffic Engineering) that provides some administrative control over routes. However, I think that modern routers do implement forms of fair queueing. It may be the case that the amount of state to implement fair queueing at every router is no longer prohibitive; also, it seems that some routers implement fair queueing between administratively-defined traffic classes, which both reduces the state since N is now the number of classes and not the number of conversations, and accomplishes part of the intent of FQ.

I'm not sure I would include both this paper and the original FQ one.

Fair Queueing

The main observation that this paper makes is that if you have a set of different workloads presented to a router, the traditional FIFO or FCFS with tail drop policy don't perform very will in the face of congestion (when the demand for link bandwidth is greater then supply). The authors propose a new queuing scheme, where packet sources receive their fair share of the available bandwidth, which is defined to be 1/N, with N concurrent flows. Furthermore, they want to provide lower-latency service to flows which use less of their fair share.

To achieve this, they consider a bitwise FCFS model where each flow submits one bit at a time for transmission. They notice that this model has the properties they want; for instance, if a set of large packets and one small packet are all submitted at about the same time, the small packet will finish before the large ones. Since routers can't actually act on bits, they develop a packetized queuing scheme which has the property that packets are transmitted in the same order that they would be using the bitwise algorithm. They evaluate their new algorithm with theoretical results and simulation with microbenchmarks on a number of network topologies. The motivation of the simulations are twofold; one, they wish to show that their basic derivation has good properties and is worth the cost over other existing queuing algorithms. They also mention that they have been unable to prove properties about a network of gateways all using fare queuing, and so want to validate that there isn't a breakdown when more then one router uses this algorithm.

The basic idea underlying this queuing algorithm, that participants who are not greedy and use less then their share of a resource should be rewarded with better service, seems to have some generality; it is also used in many modern process schedulers (like MLFQ) to improve response time for interactive processes; a very similar goal to what the are trying to accomplish.

This one can be a bit of a slog, but they explain what is going on well in the text and the math is mostly understandable. I never did get a good sense of what the difference between "preemptive" and "nonpreemptive" versions of the algorithm were; the math on page 5 seems to bound the average response time, and in the preemptive case it is less... A table of constants might have helped, since there are a whole bunch of them. Fair queuing as a paper has some value, although it seems a bit hard to understand out of context. A big weakness of this work is the per-flow state; although I would have liked it if the authors had explicitly stated what state is kept for each flow.

Monday, September 8, 2008

Increase and Decrease

It was the spring of hope, it was the winter of despair, and so it is with protocols; the trick is knowing when you're in spring and when you're in winter. This paper is by far the more interesting of the two, since I think it manages to take the control problem of endpoints communicating at rates subject to interference by other senders and collapse it to a linear model where they demonstrate stability under conditions not depending on the number of participants. One key idea is to reduce the feedback from the network to a single congestion signal; of course, in TCP, there isn't actually a congestion bit but rather it infers congestion from loss; in any case, the idea is the same. The authors demonstrate how multiplicative increase and additive decrease lead to streams which are "fair" and "efficient" for all participants. The work was very closely contemporaneous with Congestion Avoidance and Control, and so represents an effort underway at the time to more thoroughly understand the conditions for stable transport protocols.

Without much of a background in control theory, the authors provide enough information on linear controls to assert that the major benefit these controls will have over nonlinear (say, quadratic) control is that they will be less brittle because they will require less information about the state of the system to function; in particular, not knowing the number of participants in the control scheme seems like a key requirement for internet protocols where this number ("n") is going to vary by many orders of magnitude.

An interesting observation is that the authors seem to have been more concerned with the OSI suite then the internet protocols. The OSI protocols, although eventually finished and implemented, were generally regarded as failing because they were too complicated and were never available in partly-finished yet function forms like TCP was. By the time solid implementations were available, it was too late; this papers seems to be a symptom of the impulse to research and design every facet of the protocols; while this can probably be both a virtue and a vice, it certainly led to commercial failure...

The paper doesn't include any measurement studies of actual network connections, although it can serve to more fully explain figures 8 & 9 in Congestion Avoidance. I think this work is mostly valuable as an examination of the assumptions underlying the internet congestion control algorithms, and grounding them in a series of justified assumptions. They do note that they don't consider the asynchronous case where streams come and go rather then all start at once; this is actually an important divergence from the real world, although apparently the control scheme is stable in this regime as well.

Congestion Avoidance and Control


It makes sense to put this paper first, since it is good at framing the problems TCP was experiencing. The authors introduce a number of new algorithms for improving throughput and fairness; they are all classics now, including the RTT variance estimation and slow start. One of the design points the emphasize that has contributed to TCP's robustness is focusing on a wide spectrum of bandwidth-delay product regimes from slow packet-radio links to fast local wired networks. They develop a very simple variance estimator, implementable in just a few lines of c. The also outline the "slow start", which really starts exponentially fast. I actually find their diagram on page 4 pretty confusing; I think it's because they seem to be assuming that every packet will be acked; I don't think that's really the way TCP works though.

What's really interesting about this paper is its generality: any stream protocol will have to discover the available bandwidth and respond to latency changes due to causes beyond the endpoints' control. This leads to the maxim that "those who don't understand TCP are doomed to reinvent it," and based on my experience in sensor nets, we've imperfectly reinvented it at least several times. Although this is a bit of a digression, it's really interesting how TCP can take advantage of huge differences in buffering capacity (and thus window size) when devices with drastically different capabilities communicate.

The paper addresses a whole range of concerns, from the robustness of its estimators, to how easily they can be implemented with only integer arithmetic; I rather suspect that the SIGCOMM version didn't include some of the sample implementation. None the less, part of the beauty of this work is that they show you don't need complicated solutions to complicated problems, and to convince the community that although there were sophisticated ideas behind their fixes, there was no sophisticated implementation.


Tuesday, September 2, 2008

Internetworking Principles

End-to-end Arguments in System Design

Saltzer, Reed, and Clark work out the principle that the most meaningful functionality in networking ,and in many other systems, should be placed at the "ends," rather then distributed throughout the "middle" or "low-level" of the system. This has a number of very interesting correlaries which follow immediately from this argument; some of them are lain out in the paper.

One design decision which draws immediately from this argument are the semantics of IP packets: packets are delivered zero or more times, in any order. The decision to settle on these best-effort delivery semantics has allowed protocols on top to be very widely applicable. One reason this for this seems to be that
these semantics reflect reality. This is true in the sense that it is "easy" to build a contention-base protocol like Ethernet or anything using wireless which follows these guidelines using nothing more complicated then backoff and retry, while achieving any of the other properties the authors mention results like FIFO delivery or duplicate suppression almost inevitably result in TCP. Furthermore, if those assumptions are baked in to the networking fabric, it becomes much more difficult to handle broadcast domains and multicast communication patterns in the network.

A great section of the paper is the part where they discuss the problem of "identifying the endpoints." One of their principle arguments against message acknowledgments is the fact that application layer acks are almost always what is desired, not message acknowledgments. An interesting way that the paper has aged is the emergence of middle boxes all over networks conduction application-layer stateful packet inspection. The way this splits out "end to end" to "end middle end" is interesting. I think that as far as the data path is concerned, the end to end principle still applies since what the middlebox is interposing is policy control and perhaps routing based on application criteria. In most cases, these boxes are still transparent with respect to the data the endpoints receive.


The Design Philosophy of the Darpa Internet Protocols

What's facinating about this paper is how little has changed on the broader internet. Better tools for distributed management and policy-based routing? Accountability? Security? All things which remain challenges in the internet, and things which don't seem to have migrated to the core of the internet but remain point solutions within AS's. Perhaps this is to be expected from the internet architecture: newcomers will want to do the minimum possible to participate. It is interesting that even though the design emphasizes robustness to failure, it doesn't mention security as we currently understand it, which is tolerance to malicious hosts within the network.
  1. Internet communication must continue despite loss of networks or gateways.
  2. The Internet must support multiple types of communications services.
  3. The Internet architecture must accommodate a variety of networks.
  4. The Internet architecture must permit distributed management of resources
  5. The Internet architecture must be cost effective.
  6. The Internet architecture must permit host attachment with a low level of effort.
  7. The resources used in the internet architecture must be accountable.
Most of these, and in fact their ordering, still seems reasonable. I think you would have to amend 1 to read "Internet communication must continue despite loss of networks or gateways and the introduction of malicious hosts."