Specification and Design of Mercury
Mercury is a messaging system intended to allow users of a computer
environment to send messages to other users or to groups of other
users, and to locate users who wish to allow themselves to be located.
Messages may be optionally authenticated or encrypted. The system is
designed to be scaleable, secure, private, and fast.
This document describes the Mercury system in six parts. The Terminology part describes the terms used by the
system. The Specification part describes
the Unix command-line user interface to the system, which will
demonstrate what operations the system supports and how Unix users
will use the system by default. The Design part
describes the design of the Unix client and server subsystems used by
Mercury, and, at a high level, the protocols they use to communicate.
The Libraries part describes the functionality
provided by the client and server libraries to support the system.
The Protocols part gives a formal specification
of the protocols used by the Mercury system. The Rationale part gives the motivation behind the
decisions made in the earlier parts of the document.
This part contains a glossary of terms used by the Mercury system.
- Distribution Key
- Each of the Mercury services has a distribution key which clients
use to determine which server is appropriate for a given request.
Each Mercury server except the last has a distribution key boundary
which determines which keys it is responsible for; a given Mercury
server is responsible for all requests having to do with distribution
keys from the previous server's distribution key boundary to its own.
See Distribution of Services for precise
details.
- Distribution Record
- The distribution record for a Mercury service consists of the
distribution key boundaries for all Mercury servers in a realm except for the last. Clients can use the
distribution record to determine which server is appropriate for a
given request.
- Group
- A group is a named collection of users to which messages may be
sent. An important aspect of a group is that no single user is able
to determine the members of a group; the state of groups is maintained
by the Mercury servers. If a user wishes to send a message to a known
group of users, the user must explicitly enumerate those users to the
server.
- Location
- A location of a user is the name of a machine
from which a user has an active Mercury session.
- Master Server
- During automatic load redistributions, one server is considered to
be the "master server" and coordinates the effort. The identity of
the master server is not fixed, but cycles from each load
redistribution to the next.
- Realm
- A Mercury realm is a named domain of administrative control. Each
realm has its own server machines to implement the Mercury services. Server machines within a realm may
communicate with each other in order to perform load-balancing,
database synchronization, etc., while server machines within different
realms do not communicate with each other.
- When Mercury is used in conjunction with the Hesiod name service, the
assumption is made that a Mercury realm corresponds to a Hesiod realm.
However, when Mercury is used in conjunction with the Kerberos
authentication service, there is not necessarily a correspondence
between the Mercury realm and the Kerberos realm.
- Session
- While a user is engaged in a session with a Mercury realm, that
user is able to receive messages within that realm, subscribe to
groups within that realm, and, at the user's discretion, be located as
having an active session with that realm from a given machine.
- User
- A user is a named entity which can receive personal Mercury
messages and subscribe to Mercury groups.
This part specifies the Unix command-line user interface to the
Mercury system. The purpose of this section is to specify what
operations are supported by the Mercury system, and not to specify
what the user interface must be. Other interfaces, following
different philosophies, will have different characteristics.
A user begins communication with the Mercury subsystem by running hgc,
a long-term client process which handles all communication with the
Mercury servers in all realms on behalf of a user. hgc is responsible
for carrying out commands given with hg, and is also
responsible for receiving and displaying Mercury messages. Most users
will not need to know about hgc, because it will be started for them
by the system default startup scripts.
The hgc program maintains sessions with various
Mercury realms. The hgc program will maintain state for which realms
to maintain sessions with. If hgc is invoked with no existing state
(e.g. for the first time), barring command-line options to the
contrary, it should establish a session with a realm specified in a
system configuration file.
The hgc command will support options for specifying the default realm
and for specifying how to map Mercury realms onto machine names when
the usual location mechanisms are not desired. The hgc command will
also support extensive customizations for the processing and display
of received messages. These are details not within the scope of this
document.
When users wish to send messages or otherwise interact with the
Mercury system, they can use the hg command. By default, hg commands
operate on the current realm, but the user may override this by using
the "-r realm" option. If the user specifies a realm which
hgc can locate the servers in, but hgc does not have an active session with that realm, hgc should begin a session
with that realm. The hg command supports the following requests:
hg allow {locate|track} [{track|locate}]
This command instructs the Mercury system to allow other users to
locate or track the user, or both.
hg begin realm [realm...]
This command begins a session with the specified realms. This command
is persistent across invocations of hgc via state stored in the user's
home directory, in that hgc should remember which realms the user has
issued begin commmands for, and should establish sessions with those
realms each time it is invoked.
hg disallow {locate|track} [{track|locate}]
This command instructs the Mercury system to disallow other users from
locating or tracking the user, or both.
hg end realm [realm...]
This command ends a session with the specified realms. This command
is persistent across invocations of hgc, in that it removes the
specified realms from the list of realms with which to establish
sessions.
hg locate user [user...]
This command gives the location of one or more
users. "locate" may be abbreviated to "loc".
hg quit
This command instructs hgc to terminate, thus closing all active
Mercury sessions.
hg sendg group [group...]
This command prompts the user to enter a message and sends it to the
members of one or more groups. Additional
command-line flags will determine whether or not the message is to be
authenticated and/or encrypted. Another command-line option will
allow the user to specify a topic for the message in order to aid
client filtering; this will be left blank by default.
hg sendu user [user...]
This command prompts the user to enter a message and sends the message
to one or more users. Additional command-line flags
will determine whether or not the message is to be authenticated
and/or encrypted. "sendu" may be abbreviate to "send".
hg subscribe group [group...]
This command subscribes the user to one or more groups. Subscriptions are persistent across
invocations of hgc, via state stored in the user's home directory. An
"hg end" command for a realm destroys the subscription state for that
realm. "subscribe" may be abbreviated to "sub".
hg track user [user...]
This command instructs the Mercury system to send a notice to the user
whenever any of the users specified begin or end a Mercury session.
hg unsubscribe group [group...]
This command unsubscribes the user from one or more groups.
"unsubscribe" may be appreviated to "unsub".
This part describes the design of the Mercury system. [More intro.]
The Mercury system consists of three cooperating services, named hgp,
hgg, and hgl. These services support personal messages, group
messages, and location information respectively. They are implemented
by the server processes hgpd, hggd, and hgld.
At the beginning of a session, the hgc program should register with
the hgp service to enable reception of personal messages, with the hgg
service if the user is subscribed to any groups, and with the hgl
service if the user wishes to allow other users to locate or track the
user.
In general, a user uses the hg program to communicate with the user's
hgc process. The hgc process maintains sessions with Mercury servers.
The following picture shows the communications paths between an hgc
process, an hg client process, and the servers in a Mercury realm.
In the above picture, the hgl service is implemented with a single
server machine, while the hgp and hgg services share two machines.
Note that when services are replicated in this manner, the hgc process
does not choose a single server to communicate with; they may
communicate with any of the servers.
All of the Mercury services may be distributed across several servers.
Each service has associated with it a distribution key which determines which server is appropriate
for processing a particular kind of request. For hgl, the
distribution key is the name of the user to be located or tracked.
For hgp, the distribution key is the name of the user to which a
personal message is to be sent. For hgg, the distribution key is the
name of the group to which a personal message is to be sent.
Servers are distributed based on lexical divisions on the distribution
key. Given an ordering of n servers, it is possible to
describe the state of the server distribution by a record of
n - 1 strings giving the lexically greatest distribution key
appropriate for that server. Clients will cache distribution records
for each service, and will receive updated distribution records when
they send a request to the "wrong" server.
A server is responsible for storing the state related to all
distribution keys appropriate for that server. In addition, each
server is responsible for knowing the state related to the previous
server's distribution keys. Each server will inform the next server
of its state updates in packet-sized batches or when no state updates
have occurred in a fixed, short length of time. If a server loses its
state information and is restored quickly, it can query the next
server for its state; if the server remains down, the other servers
can redistribute the load among themselves. In any event, only those
few requests which occurred just before the server loses its state
information will be lost.
Each server will keep statistics on the requests it is receiving
having to do with various distribution keys. Each hour, one server
(the master server, the identity of which
cycles from one server to the next each hour) will query the rest of
the servers for the number of packets they have processed (both sent
and received) in the last hour, and decide whether or not to
redistribute load, changing the distribution record. Redistributing
load is not to be done frivolously, because it will result in some
clients sending a packet to the wrong server.
If the master server does decide to redistribute load, it will compute
the average total number of packets sent and received. Then, going
from the first server to the last, the master server will determine
distribution key boundaries for each server, by adding up existing
packet loads for servers and then querying some servers for one or
more distribution keys which will yield close to correct splits in
that server's load. The following diagram shows, with simplified
numbers, how the master server would decide new distributions key
boundaries for six servers with unbalanced loads. (It does not matter
which of the six servers is the master server.)
For the situation in the diagram above, the master server would query
the second and fifth servers for one key and the sixth server for two
keys to divide their loads appropriate. One of the existing
distribution keys can be reused because it is close enough to the
right place in the existing load distribution to be used to separate
the second and third servers.
Once the master server has built a new distribution record, [you're in
a really hairy update situation. Make sure the same strategy--and
code--can be used for dealing with a server that went down].
The Mercury system is supported by three client libraries used to hide
the details of communication between parts of the system. The libhg library contains routines for communication
between Mercury clients and servers. The libhgserver library contains routines for
communication between servers. The libhgc library
contains routines to communicate on the local machine with the hgc
client program.
The libhg library contains routines for communication between Mercury
clients and the servers which implement the Mercury services.
Types
- Hg_session
- An Hg_session structure contains the state of a session with the
servers of a given realm. It should be initialized with an hg_begin_session() call, cleaned up with an
hg_end_session() call, and passed as an
argument to all functions operating on that realm.
- Hg_security
- Hg_security is an enumeration type, used to determine the security
level and algorithm used for a message. The possible security types
are:
- HG_SECURITY_NONE
- Send the message in plain text with no authentication.
- HG_SECURITY_KAUTH
- Send the message in plain text with Kerberos authentication and an
MD5 cryptographic checksum on the contents using the Kerberos session
key.
- HG_SECURITY_KDCRYPT
- Send the message with Kerberos authentication, encrypted using DES
and the Kerberos session key.
int hg_begin_session(const char *realm, Hg_security security,
Hg_session *session);
This function establishes a session with the servers in a Mercury
realm. realm is the name of the realm to connect to, or NULL
for the default realm. security determines the security
procedure to apply to the initial startup requests. session
should be a pointer to an Hg_session object, which will remain valid
until the session is closed with an hg_end_session() call.
At the beginning of the session, the user can receive personal
messages but cannot be located and is not subscribed to any groups.
hg_begin_session() returns 0 if successful, or one of the following
error codes:
- HG_ERROR_INVALID_SECURITY
- security is not a valid security procedure.
- HG_ERROR_REALM_EXISTS
- The current program already has a session established with realm.
- HG_ERROR_REFUSED
- The hgp server refused to establish the connection. Server may
refuse to establish connections if the initial requests are not
authenticated.
- HG_ERROR_TIMEOUT
- The hgp server could not be contacted.
int hg_send_users(Hg_session *session, const char **users,
int num_users, const char *topic,
const char *message, Hg_security security);
This function sends a personal message with the body message
and the topic topic to the users users in the realm
given by session, using the security procedure given by
security.
Upon success, hg_send_users() sets all num_users pointers in
users to NULL and returns 0. Otherwise, hg_send_users() sets
all of the pointers in users to zero which it successfully
sent message to, and returns one of the following error
codes:
- HG_ERROR_INVALID_SESSION
- session is not an established session.
- HG_ERROR_INVALID_SECURITY
- security is not a valid security procedure.
- HG_ERROR_USER_UNREGISTERED
- Users whose pointers are still non-null were not registered with
the hgp server.
- HG_ERROR_TIMEOUT
- The hgp server could not be contacted; users whose pointers are
still non-null may have received the message, but probably have not.
int hg_send_groups(Hg_session *session, const char *groups,
int num_groups, const char *topic,
const char *message, Hg_security security);
This function sends a personal message with the body message
and the topic topic to the recipients of the gruops
groups in the realm given by session, using the
security procedure given by security.
Upon success, hg_send_groups() sets all num_groups pointers
in groups to NULL and returns 0. Otherwise, hg_send_users()
sets all of the pointers in groups to zero which it
successfully sent message to, and returns one of the
following error codes:
- HG_ERROR_INVALID_SESSION
- session is not an established session.
- HG_ERROR_INVALID_SECURITY
- security is not a valid security procedure.
- HG_ERROR_GROUP_EMPTY
- Groups whose pointers are still non-null had no subscribers.
- HG_ERROR_REFUSED
- The hgp server refused to deliver the message to the groups whose
pointers are still non-null.
- HG_ERROR_TIMEOUT
- The hgg server could not be contacted; groups whose pointers are
still non-null may have received the message, but probably have not.
The protocols
The Mercury messaging and location system is designed as a replacement
for the Zephyr messaging system in use at MIT and other sites.
Mercury is designed to correct several flaws in the Zephyr system
which limit its scaleability and flexibility and increase its
complexity. This part describes the deficiencies in the Zephyr system
which Mercury is designed to correct, and then attempts to justify
some of the individual decisions made in the design of Mercury.
Deficiencies in the Zephyr System
- The concepts of "class" and "instance" behind the Zephyr system
are difficult for new users to understand.
- A Zephyr client is allowed to make any request to any server.
Therefore, in order to implement the Zephyr service efficiently, all
servers must know the state related to all queries, and all servers
must see every state update request. This is a fundamental problem
that limits the scaleability of the service: as you add more servers,
you find tha all of the servers are doing the same work of processing
state updates.
- The Zephyr protocol and server implementation makes the
assumption that every client machine has a "host manager", a
rendezvous agent responsible for forwarding all packets from client to
server. Because the host manager operates on a well-known port, this
makes it impossible to test or use modifications to an important part
of the Zephyr client system without having control of the machine. A
separate process (zwgc) maintains each user's session with the Zephyr
servers. zwgc comunicates directly with the servers for some
operations, but communicates through the host manager for others.
The purpose of the host manager is to allow the short-lived
command-line Zephyr clients to send notices to the servers without
having to worry about locating available servers and retransmitting
packets. It is simpler and more flexible for the servers to view each
user's session as a single, long-lived connection; the short-lived
command-line clients can communicate with the process that maintains
that connection. In effect, Mercury's hgc combines the functionality
of zhm and zwgc.
- The Zephyr protocol does not allow users to efficiently
communicate with a restricted group of other users; that is, the
nature of Zephyr classes and instances does not allow users to control
the people subscribed to them without having control of the Zephyr
servers. By providing support for multiple recipients in the hgp
protocol, Mercury allows users to efficiently send messages to
explicitly enumerated lists of other users.
- The Zephyr protocol uses "magic" class names and instance names
to make requests to the server, overloading the fields used to
determine the interest group of a message. This makes the protocol
difficult to describe, understand, and cleanly implement.
- The Zephyr protocol is not properly layered to allow for
effective authentication, and does not allow for encryption. Mercury
layers authentication data at the top of the packet, allowing for a
wide range of possible authentication schemes.
- The Zephyr protocol is not extensible; it is impossible to add
new fields to Zephyr packets without changing the version number.
Mercury allows new fields to be added which will be ignored by older
clients and servers.
- The Zephyr protocol does not allow the separation of the location
service from the other services its provides, nor does it allow for
the separation of the personal messaging service from the group
messaging service.
- The Zephyr library is complex and convoluted, especially in the
realm of fragmented notices, and exposes its internals to the
application programmer. This makes the library difficult to
maintain.
- The Zephyr library isn't designed to be easily made
thread-safe.
Separation of Services
The division of the Mercury system into three services presents two
disadvantages over the monolithic approach. First, clients must send
three packets at startup time, to initiate their sessions with all
three services, where Zephyr gets away with two and could get away
with one. The division is also expensive in terms of packets required
to obtain the distribution records for the three services. If the
services are all implemented on the same server machine, that server
will have to process more packets than it would if the protocol were
monolithic. Second, the programmer who delves for the first time into
the implementation behind the Mercury library abstraction is
confronted with three separate but similar protocols. This will
probably make the library implementation more difficult to understand
at first.
The minimal addition to the overhead of beginning and ending a session
is justified by the ability to separate the three Mercury services and
implement them on different machines. It would be possible to design
a protocol which allows for the services to be separated but does not
incur the packet overhead at the beginning and end of a session, but
it would be undesirably complicated. As for the complexity concerns,
the simplicity of implementing each individual protocol and server
balances out the complexity of having three different client-server
protocols.
Simplification of Location Access Controls
Some users may find it an unpleasant aspect of Mercury's design that
it does not support Zephyr's distinction between "realm" and "net"
exposures. Users may allows themselves to be located or tracked by
other users, or they may disallow this, but they may not restrict
locations and trackings to users within their Kerberos realm. The
designer felt that this distinction was not sufficiently useful to
warrant its inclusion in the user interface: the distinction confuses
some users who find themselves locatable by other users but not by
services such as WWW gateways, and the distinction unnecessarily
couples the user interface with the Kerberos authentication system.
Simplification of Groups
Another aspect of the Zephyr system which does not exist in Mercury is
the concept of a class and instance. The Mercury system supports only
the concept of a named "group", and a topic field on Mecury messages
intended to support client filtering. The designer felt that the
simpler concept of a group better represents the way people currently
use Zephyr.
Most users of Zephyr at MIT subscribe either to all instances of a
given class (e.g. class "sipb") using the wildcard subscription
feature, or they subscribe to instances of class "message" such as the
instance "white-magic". Either of these concepts can be represented
as group names.
A few users subscribe to specific instances of a class, e.g. class
"sipb", instance "www", but this is extremely rare, and the savings in
sever packet forwarding allowed by this level of filtering do not
justify the additional server complexity it results in. Some users
also subscribe to all instances of class "message", so that they can
see all public conversations, but this class of users is quite small,
and social conventions (e.g. an informal registry of public groups)
can adequately support this kind of usage.
MIT also uses Zephyr for two operational tasks: the forwarding of
system log messages (class "syslog") and the sending of messages
related to certain file servers (class "filsrv"). System log messages
can be dealt with by personal messages (with an appropriate topic
string) and by group names of the form "syslog.machinename". Class
filsrv messages are so rare that it is not worth maintaining the state
on the messaging servers of which users is interested in precisely
which file server machines; instead, all users who use the file
servers at all can be subscribed to a group "filsrv", and filsrv
messages can be sent to the group filsrv with a topic string giving
the machine name. Clients can keep track of whether users are
actually interested in filsrv messages for a particular machine.
Distribution criteria and algorithm
The distribution criteria and distribution algorithm satisfy several
important properties. First, the current state of the server
distribution can be expressed in a short record (which will probably
always fit into a single UDP packet), allowing it to be cached within
clients. Second, clients will almost always be able to send requests
to the correct server, so that the servers will spend very little of
their time bouncing packets back to clients with a new distribution
record. Third, no request except for a retrieval of all group or
tracking subscriptions needs to go to all of the servers for a
service, so increasing the number of servers should always be an
effective response to rising load. Fourth, the distribution algorithm
does not involve large amounts of communication between servers.
The one major count against the distribution criteria is that it is
not generic; clients are exposed to the method by which servers
determine which requests should go where. The designer does not
anticipate that anyone will discover a significantly better method for
distributing the Mercury services among several machines.