6.894 Lab 5: Do a cool project
Due date for team list: October 23.
Due date for project proposal: November 2.
Due date for paper: December 7.
Introduction
In this lab you will define your own project, execute it, and write a
paper about it. The lab is structured in two parts:
- Project proposal. The proposal is a short (maximum of two pages)
proposal for what your project will be. It should state what problem
you are solving, why you are solving it, what software you will write,
and what the expected results will be. You won't be judged on your
proposal; it is there to help you to get started.
- Project paper. The paper is a maximum of 12 pages, which
describes the actual problem you solved, the software you have built,
and reports how well you solved the stated problem. We expect the
paper to have a serious evaluation section, similar to the ones that
you have seen in the research papers you are reading in class. Your
project grade will be based on the quality of your paper.
Doing a good project is a daunting task. In general, it is better
to tackle a precise small problem and do a good job evaluating it than to
tackle a large problem and get lost in the scope of the problem. To
help you to define a project we will offer you some suggestions (see
below). We also expect to be involved in all stages of your project.
Please, come talk to use about your idea for a project, how you should
execute the project, what you should write about in your final paper,
etc.
The project is to be executed in teams of 3 students. Find someone
else to work with and send email to the TA. Tell us the names of your
teammates. The email is due by Tuesday Oct. 23 (that is in a couple
days).
Suggestions for projects
We suggest you use as the SFS software that you have used in the
last four labs as a base for project. This will allow you to explore
easily an interesting problem in the context of distributed file
systems without having to build a system from the ground up. Here is
a list of specific suggestions:
- Design, implement, and evaluate a new cache replacement scheme.
You've read about
UBM;
we think that an alternative, simpler
scheme might work as well. We propose to measure the IRG
(inter-reference gap) for each block separately, use the gaps to
predict when each block will be referenced, and replace the block that
will be referenced furthest in the future. The project is to
implement this simpler scheme (or a better one of your own invention),
the UBM scheme, and evaluate them. From the UBM home page you can
fetch some useful code.
- Design and build a server selection mechanism for SFS read-only servers. SFS read-only
allows data to replicated on untrusted servers. Devise and implement
a scheme that allows a client to contact the "best" replica, where
"best" means the one with the lowest load and best network connection.
The basic idea would be for the central server to return a list of
replicas and for the client to pick one based on a selection algorithm
of your design. One of the challenges in this project is how to
measure "best" network connections; you want to develop multiple
schemes and evaluate them. There is a rich literature on server
selection and Internet behavior.
- Design and build a FreeNet-style SFS read-only
service. The idea is that anyone can set up a read-only replica
for a particular set of files and that requests will be routed to the
appropriate server. Clients would refer to these files by naming the
public key that signed the files. Your system would route the request
to a server that mirrors the files. In contrast to the previous
project, this system has no centralized components. It will require
that you design and implement a protocol that given a HOSTID finds a SFS read-only server that
has the replica. This will allow anyone to volunteer to be a secure
mirror of the RedHat distribution.
- Extend SFS to include Plan9-like naming. The idea is to allow SFS
to have full control over all names (not just /sfs). A possible
approach is when you log into SFS the system chroot()s to /sfs-$user;
now SFS will see all name references and be able to interpret them
exactly as it desires. For example, it might interpret /bin as a
union directory a la Plan 9. Design and build the utilities that
allow ordinary users to have precise control over the construction of
the name space.
- Design and build a proxy that allows access via SFS to resources
other than files on the server's disk. For example, build an SFS front
end to a database. This would be a useful tool for moving Athena
resources such as Hesiod and Moira into a Plan 9-like file name world.
Another example is device access -- allow remote access of sound cards
through the file system. Access to FTP servers via SFS may also
be an interesting project.
In all cases the challenge is figure out how
to provide a sensible interface to objects that don't
act like standard UNIX files.
You may be
able to learn from the
Plan 9
9P protocol.
- Design and build a
CIFS
to SFS proxy. This would allow Windows
clients to access SFS. The proxy might run on a separate machine
(neither the Windows client nor the SFS storage server)
and be shared between a number of Windows clients.
You may find the
open-source Samba
code useful.
The challenge here is that UNIX and Windows have very different
ideas about file systems and file system protocols, and it's
likely to be difficult to translate. You'll learn a lot about
about real protocols (and other stuff).
- Use SFS agents to allow the use of
SDSI/SPKI
for key management.
SDSI/SPKI is an egalitarian public-key management system
with support for groups and sophisticated naming.
SDSI/SPKI could be used to help authenticate SFS users,
SFS servers, or both.
- Use SFS to build a SDSI-like key management infrastructure.
The idea is that SDSI's linked names could be implemented with
SFS self-certifying pathnames and secure symbolic links.
A possible goal is to allow the use of SFS to flexibly manage
authentication of users and groups (for access control), in
a manner similar to the way that SFS already manages server keys.
- Design and build an on-disk file system representation consisting
of just a B-tree. You probably want to modify sfsusrv to make
calls to a B-tree package such as
Berkeley DB
rather than (as currently) to the UNIX file system.
Your challenges are (1) to figure out
how to make the NFS operations efficient using the B-tree and (2) to
make crash recovery work well.
You can view this as an elegant simplification of the SGI
XFS
file system.
- Design and implement striping for the
SFS read-only
protocol. The idea is that there would be multiple replicas
of the files you want, you would contact N replica servers,
and you would split the read requests over those servers in
the hope of your download completing in 1/Nth the time.
You can assume that someone else (probably the main server)
selects the N best servers for a client.
If you prefer you can add striping to your web proxy rather
than to SFS.
Part of the challenge
is making sure that the download time isn't dominated too much
by the slowest of the N servers.
- Choose some aspect of
NFS
that is slow, and re-design the
protocol, client algorithms, and/or server algorithms and disk layout
to make it faster. Implement your design and show that it's a good
idea. For example, any RPC that causes a conventional NFS server to
write the disk tends to be slow, because servers make sure the writes
are on the disk before they send the RPC reply.
NFS3
(here's the spec)
relaxes this
restriction for data writes -- the client continues to buffer written
data after the write RPC, allowing the server flexibility to batch and
schedule writes; the client only insists that the server flush the
writes to disk when the client wants to re-use the buffers. Perhaps
similar changes to the protcol could make operations like file create
and delete faster, by having the client log such operations and letting
the server complete them at its convenience. The client would replay
some of the log to the server after a server crash and reboot. The
challenge here is to achieve higher performance while retaining
reasonable behavior after failures.
- Design a disk layout for a file system and implement it in an SFS
server. Make sure your layout and update algorithms have good crash
recovery properties. You may want to look a the
6.033 lab
assignment from last spring; the goal is the same though the
SFS tools are now a bit different.
Don't worry if some other group plans to work on the same suggestion
as you do -- we can probably find a way for multiple groups to share a
general project area without significant overlap.
Your Paper
This section provides some suggestions and guidelines on writing style
and some of the things we will look for in your final paper.
Suggestions on Writing Style
Your paper should be as long as is necessary to explain the problem,
your solution, the reasons for your
choices,
and your analysis of your solution.
It should be no longer than that. The body of your paper must not
exceed twelve 11-point, single-spaced pages in length. Please use
1-inch margins. In general, your paper's style and arrangement should
be similar to the papers we've read in class.
A good paper begins with an abstract. The abstract is a very short
summary of the entire paper. It is not an outline of the
organization of the paper! It states the problem to be addressed (in
one sentence). It states the essential points of your solution,
without any detailed justification. And it announces any conclusions
you have drawn. Good abstracts can fit in 100-150 words, at most.
The body of your paper should expand the points made in the abstract.
Here you should:
- Introduce the problem and the externally imposed constraints.
- State the goals of your solution clearly.
- Describe the design of your solution.
You may wish to divide the description into a high level
architecture and a set of lower-level implementation decisions.
This would be a good place for pictures and diagrams.
- Analyze how well the system you built fulfils your goals.
Depending on your system, the analysis might deal with
performance in the sense of throughput or running time;
but keep in mind that factors such as reliability and
useability may be as or more important goals than
performance for some systems.
- Briefly review related work in the area of your project.
The goal is to show either how you extended existing work
or how you improved on it.
- Conclude with a review of lessons to be learned from your work.
- Document your sources, giving a list of all references (including
personal communications). The style of your citations (references) and
bibliography should be similar to the styles in the technical papers you're
reading in this class. In particular, a bibliography at the end and
no citations in the text of your paper is insufficient; we'd like to
see what specific pieces of information you learned from where as we read
your paper.
Write for an audience that understands basic O/S and network concepts
and has a fair amount of experience applying them in various
situations, but has not thought carefully about the particular problem
you are dealing with.
How do we evaluate your paper?
When evaluating your paper, we will look at both content
and writing.
Some content considerations:
-
Do you provide motivation for why the problem you chose is
worthwhile or interesting?
-
Does your solution address the goals you stated?
-
Do you explain your decisions and the trade-offs?
-
How complex is your solution? Simple is better, yet sometimes simple won't
do the job. But unnecessary complexity is bad.
-
Does your solution fit well with the rest of the system? If your solution
requires modifying every piece of hardware, software, and data in sight,
it won't be credible, unless you can come up with a very good story why
everything needs to be changed.
-
Is your analysis clear?
Some writing considerations:
-
Is the report easy to comprehend?
-
Is it well organized and coherent?
-
Does it use diagrams where appropriate? (A frequent problem when people
use word processors is that they try to express everything in words, either
because the word processor doesn't make it easy to include diagrams, or
they haven't ever learned how to use the drawing features. Pictures can
communicate some ideas far better.)
-
Does it use the concepts, models, and terminology used in the course?
If not, does it have a good reason for using a different universe of discourse?
-
Is there a good abstract and bibliography?
You can find other helpful suggestions on writing this kind of report in
the M.I.T. Writing Program's on-line guide to writing Design and Feasibility
Reports. You may also want to look at the Mayfield
Handbook's explanation of IEEE documentation style. A very good
book on writing style is: "The Elements of Style," by William Strunk Jr.
and E. B. White, Third Ed., MacMillan Publishing Co., New York, NY, 1979.
What to Hand In
You should e-mail your team list to jinyang@lcs.mit.edu by
October 23.
Your team should e-mail its proposal to jinyang@lcs.mit.edu
by November 2. It should be no more than two pages.
E-mail a PostScript file containing your final paper, and (separately)
a uuencoded tar file containing your source by December 7th. Your
project grade will be based on the paper, not on the source.
Make sure you save enough time to write a good paper, since that's
what will determine your grade!