Course»Course 6»Fall 2009»6.852/18.437»Homepage

6.852/18.437  Distributed Algorithms

Fall 2009

Instructor: Nancy Ann Lynch

TA: Alex Cornejo Collado

Lecture:  Tuesday and Thursday 11:00AM-12.30PM  (1-135)
TA Office Hours:  Tuesday and Wednesday 5:00PM-6:00PM  (32-G6 Lounge)      

What is this course about?: 

Distributed Algorithms are algorithms designed to run on multiple processors, without tight centralized control. In general, they are harder to design, and harder to understand, then single-processor sequential algorithms. Distributed algorithms are used in many practical systems, ranging from large computer networks to multiprocessor shared-memory systems. They also have a rich theory, which forms the subject matter for this course.

Theoretical results about distributed algorithms appear in research conferences such as PODC (Principles Of Distributed Computing), DISC (International Symposium on DIStributed Computing), OPODIS (International Conference on Principles of Distributed Systems), and SPAA (ACM Symposium on Parallelism in Algorithms and Architectures). They also appear in general theoretical computer science conferences such as FOCS (Foundations of Computer Science) and STOC (Symposium on Theory of Computing), and in broad conferences involving distributed computing, such as ICDCS (International Conference on Distributed Computing Systems).

What do distributed algorithms researchers do? They (we) define various kinds of distributed computing environments, including shared-memory and network-based environments, and identify problems to be solved in those environments. They design new algorithms for these problems, and analyze the correctness, performance, and fault-tolerance of their algorithms. They also prove lower bounds and other impossibility results, which explain why certain tasks cannot be carried out in certain kinds of distributed settings, or cannot be carried out within certain cost constraints. Researchers typically study problems that arise in practical distributed systems, including problems of communication, data management, resource management, synchronization, and distributed agreement. Sometimes, the results have impact on practical distributed system design.

This year, the course will be taught by Prof. Lynch, with some help from Dr. Victor Luchangco, a member of the Scalable Systems research group at Sun Microsystems Research. The ``core'' of the material will consist of basic distributed algorithms and impossibility results, as covered in Lynch's book ``Distributed Algorithms''. This will be supplemented by some updated material on topics such as self-stabilization, wait-free computability, and failure detectors, and some new material on scalable shared-memory concurrent programming.

6.852 is a graduate-level course that is intended to do two things: provide an introduction to the most important basic results in the area of distributed algorithms, and prepare interested students to begin independent research or take a more advanced course in distributed algorithms. Usually, the students who take the course are a mixture of PhD students and MEng students. Since the course is run at a PhD level, most MEng students should find it challenging.

 

Prerequisites:

To take 6.852, you should have:

  • ``Mathematical maturity''. In particular, you should be very good at reading and writing mathematical proofs.

  • General knowledge about some distributed systems. For instance, MIT's undergraduate course 6.033, Computer Systems Engineering, would be good background.

  • Experience with sequential algorithms and their analysis. MIT's undergrad course 6.046 is sufficient.

  • (Desirable, but not essential) Experience with formal models of computation. MIT's course 6.045 or 6.840 would be fine for this.

Announcements

Problem Set 6 Part a - Clarifications

During yesterdays office hours we realized the lecture notes do not contain a description of invisible readers. In an invisible read, the transaction does not write any control information into the shared memory to inform the other transactions on possible read/write conflicts. Hence note that in the context of obstruction-free STM, invisible reads require a read validation step (described in the notes).

Also, in slide 28, titled Evaluation, of lecture 20, it says "[the] serialization point [of obstruction-free transactions] can be the commit point (CAS).". This is a typo, it should say "CAN'T be the commit point.." and proving this statement is the purpose of problem 4 of the problem set.

Finally a friendly reminder that part b of the problem set is online and the problem set is due December 3rd. Happy holidays.

Announced on 25 November 2009  10:18  a.m. by Alex Cornejo Collado

Something to think about over the holiday weekend.

Here is an open question you can think about, if you get bored during the long holiday weekend.

Is it possible to implement a wait-free 3-process FIFO queue atomic object,
using only 2-process consensus objects and any-number-of-process read/write registers?

Warning:  Lots of smart people have already spent lots of time trying to solve this.

Announced on 24 November 2009  4:04  p.m. by Alex Cornejo Collado

Problem Set 4 Part a, clarification

The third question of Problem Set 4 Part a, (the one regarding the Mattern paper) reads "...in a maximal consistent cut M <= K", it should read "M <= C". Other references to K in that same question should be replaced by C. A corrected version of the PDF is now online to avoid confusing.

Thanks to David for pointing this out.

Announced on 28 October 2009  6:15  p.m. by Alex Cornejo Collado

Problem Set 3 Part b, Problem 4 Wording.

During office hours a few of you were confused by the wording of problem 4, on problem set 3 part b. If you don't have any doubts about this question, then please disregard this message completly.

The question is asking for a description of the best algorithm you can come up with that solves the k-pseudo-sesion problem and its worst-case time complexity. The time complexity is measured according to T(A) as defined in p. 557 of the textbook.

Announced on 21 October 2009  5:30  p.m. by Alex Cornejo Collado

Problem Set 3 Typo

There is a typo (I included the wrong exercise) in problems set 3 part a.

The first problem should not be Exercise 8.8 (since this was already solved in Pset 2), but the following exercise:

Exercise 15.34. Try to &#64257;nd the smallest graph (i.e., with the fewest nodes) and the shortest execution  (i.e., with the fewest number of messages) that exhibits the behavior described.

The PDF on the website has been corrected to reflect this change.

Announced on 12 October 2009  9:38  p.m. by Alex Cornejo Collado

View archived announcements