6.852/18.437 Distributed Algorithms
Fall 2009
Instructor: Nancy Ann Lynch
TA: Alex Cornejo Collado
Lecture:
Tuesday and Thursday 11:00AM-12.30PM
(1-135)
TA Office Hours: Tuesday and Wednesday 5:00PM-6:00PM
(32-G6 Lounge)
What is this course about?:
Distributed Algorithms are algorithms designed to run on multiple processors, without tight centralized control. In general, they are harder to design, and harder to understand, then single-processor sequential algorithms. Distributed algorithms are used in many practical systems, ranging from large computer networks to multiprocessor shared-memory systems. They also have a rich theory, which forms the subject matter for this course.
Theoretical results about distributed algorithms appear in research conferences such as PODC (Principles Of Distributed Computing), DISC (International Symposium on DIStributed Computing), OPODIS (International Conference on Principles of Distributed Systems), and SPAA (ACM Symposium on Parallelism in Algorithms and Architectures). They also appear in general theoretical computer science conferences such as FOCS (Foundations of Computer Science) and STOC (Symposium on Theory of Computing), and in broad conferences involving distributed computing, such as ICDCS (International Conference on Distributed Computing Systems).
What do distributed algorithms researchers do? They (we) define various kinds of distributed computing environments, including shared-memory and network-based environments, and identify problems to be solved in those environments. They design new algorithms for these problems, and analyze the correctness, performance, and fault-tolerance of their algorithms. They also prove lower bounds and other impossibility results, which explain why certain tasks cannot be carried out in certain kinds of distributed settings, or cannot be carried out within certain cost constraints. Researchers typically study problems that arise in practical distributed systems, including problems of communication, data management, resource management, synchronization, and distributed agreement. Sometimes, the results have impact on practical distributed system design.
This year, the course will be taught by Prof. Lynch, with some help from Dr. Victor Luchangco, a member of the Scalable Systems research group at Sun Microsystems Research. The ``core'' of the material will consist of basic distributed algorithms and impossibility results, as covered in Lynch's book ``Distributed Algorithms''. This will be supplemented by some updated material on topics such as self-stabilization, wait-free computability, and failure detectors, and some new material on scalable shared-memory concurrent programming.
6.852 is a graduate-level course that is intended to do two things: provide an introduction to the most important basic results in the area of distributed algorithms, and prepare interested students to begin independent research or take a more advanced course in distributed algorithms. Usually, the students who take the course are a mixture of PhD students and MEng students. Since the course is run at a PhD level, most MEng students should find it challenging.
Prerequisites:
To take 6.852, you should have:
-
``Mathematical maturity''. In particular, you should be very good at reading and writing mathematical proofs.
-
General knowledge about some distributed systems. For instance, MIT's undergraduate course 6.033, Computer Systems Engineering, would be good background.
-
Experience with sequential algorithms and their analysis. MIT's undergrad course 6.046 is sufficient.
-
(Desirable, but not essential) Experience with formal models of computation. MIT's course 6.045 or 6.840 would be fine for this.
Announcements
Problem Set 6 Part a - Clarifications
Also, in slide 28, titled Evaluation, of lecture 20, it says "[the] serialization point [of obstruction-free transactions] can be the commit point (CAS).". This is a typo, it should say "CAN'T be the commit point.." and proving this statement is the purpose of problem 4 of the problem set.
Finally a friendly reminder that part b of the problem set is online and the problem set is due December 3rd. Happy holidays.
Announced on 25 November 2009 10:18 a.m. by Alex Cornejo Collado
Something to think about over the holiday weekend.
Is it possible to implement a wait-free 3-process FIFO queue atomic object,
using only 2-process consensus objects and any-number-of-process read/write registers?
Warning: Lots of smart people have already spent lots of time trying to solve this.
Announced on 24 November 2009 4:04 p.m. by Alex Cornejo Collado
Problem Set 4 Part a, clarification
Thanks to David for pointing this out.
Announced on 28 October 2009 6:15 p.m. by Alex Cornejo Collado
Problem Set 3 Part b, Problem 4 Wording.
The question is asking for a description of the best algorithm you can come up with that solves the k-pseudo-sesion problem and its worst-case time complexity. The time complexity is measured according to T(A) as defined in p. 557 of the textbook.
Announced on 21 October 2009 5:30 p.m. by Alex Cornejo Collado
Problem Set 3 Typo
The first problem should not be Exercise 8.8 (since this was already solved in Pset 2), but the following exercise:
Exercise 15.34. Try to find the smallest graph (i.e., with the fewest nodes) and the shortest execution (i.e., with the fewest number of messages) that exhibits the behavior described.
The PDF on the website has been corrected to reflect this change.
Announced on 12 October 2009 9:38 p.m. by Alex Cornejo Collado