Resilient Basic Services for Distributed Systems
Final Report Abstract
Broadly speaking, the goal of this project was to further our understanding of the fundamental capabilities and limitations of distributed systems. In this context, a system is “distributed” if the critical resource is not computation, but the amount of communication. Specifically, three main topics have been studied: • Fault-tolerant hardware clock generation and distribution: Traditionally, computer systems’ computations are controlled by a common clock signal that is generated by a single quartz oscillator and distributed across the chip by a clock distribution network. If, e.g., the quartz oscillator fails, so does the system. This project laid the theoretical foundation for developing clock generation and distribution networks that are capable of sustaining multiple arbitrarily failing components and also recover from an unbounded number of transient faults. • Graph algorithms: Graphs model pairwise relations between a set of socalled “nodes”; typically, these relations simply represent a communication link. Computing specific structures on graphs is highly important, since many important and challenging tasks ultimately boil down to solve such a graph problem efficiently. Most significantly, the project resulted in much faster distributed algorithms for the Steiner Forest problem. For instance, solving this problem can be useful when connecting multiple groups of train terminals at low cost by constructing railways, for Internet providers to connect their users, or for devising the interconnection network of a computer processor. • Search algorithms for independent and simple agents: This problem is motivated by ant foraging: ants have been observed to independently (i.e., without communicating outside the nest) search for food, raising the question how they perform this task without wasting too much effort by searching the same places frequently. The idea here is that devising “good” algorithms for this task from a computer science point of view will give insights into ant behavior. In the project, tight bounds for the amount of memory that is required to search efficiently with multiple agents in a simplified model. The results are also of more general interest, as the algorithms and matching lower bounds classify a natural extension of concurrent random walks, namely those where the moving agents have some limited memory and can adapt their behavior accordingly.
Publications
-
Fault-tolerant Algorithms for Tick-generation in Asynchronous Logic: Robust Pulse Generation. Journal of ACM (JACM), Volume 61 Issue 5, August 2014, Article No. 30, 74 S.
Danny Dolev, Matthias Függer, Christoph Lenzen, and Ulrich Schmid
-
Improved Distributed Steiner Forest Construction. 33rd Symposium on Principles of Distributed Computing (PODC), 2014
Christoph Lenzen and Boaz Patt-Shamir
-
Trade-offs between Selection Complexity and Performance when Searching the Plane without Communication. 33rd Symposium on Principles of Distributed Computing (PODC), 2014
Christoph Lenzen, Nancy Lynch, Calvin Newport, and Tsvetomira Radeva