Mercury Vapor

Specification and Design of Mercury

Mercury is a messaging system intended to allow users of a computer environment to send messages to other users or to groups of other users, and to locate users who wish to allow themselves to be located. Messages may be optionally authenticated or encrypted. The system is designed to be scaleable, secure, private, and fast.

Overview

This document describes the Mercury system in six parts. The Terminology part describes the terms used by the system. The Specification part describes the Unix command-line user interface to the system, which will demonstrate what operations the system supports and how Unix users will use the system by default. The Design part describes the design of the Unix client and server subsystems used by Mercury, and, at a high level, the protocols they use to communicate. The Libraries part describes the functionality provided by the client and server libraries to support the system. The Protocols part gives a formal specification of the protocols used by the Mercury system. The Rationale part gives the motivation behind the decisions made in the earlier parts of the document.

Terminology

This part contains a glossary of terms used by the Mercury system.

Distribution Key: Each of the Mercury services has a distribution key which clients use to determine which server is appropriate for a given request. Each Mercury server except the last has a distribution key boundary which determines which keys it is responsible for; a given Mercury server is responsible for all requests having to do with distribution keys from the previous server's distribution key boundary to its own. See Distribution of Services for precise details.
Distribution Record: The distribution record for a Mercury service consists of the distribution key boundaries for all Mercury servers in a realm except for the last. Clients can use the distribution record to determine which server is appropriate for a given request.
Group: A group is a named collection of users to which messages may be sent. An important aspect of a group is that no single user is able to determine the members of a group; the state of groups is maintained by the Mercury servers. If a user wishes to send a message to a known group of users, the user must explicitly enumerate those users to the server.
Location: A location of a user is the name of a machine from which a user has an active Mercury session.
Master Server: During automatic load redistributions, one server is considered to be the "master server" and coordinates the effort. The identity of the master server is not fixed, but cycles from each load redistribution to the next.
Realm: A Mercury realm is a named domain of administrative control. Each realm has its own server machines to implement the Mercury services. Server machines within a realm may communicate with each other in order to perform load-balancing, database synchronization, etc., while server machines within different realms do not communicate with each other.; When Mercury is used in conjunction with the Hesiod name service, the assumption is made that a Mercury realm corresponds to a Hesiod realm. However, when Mercury is used in conjunction with the Kerberos authentication service, there is not necessarily a correspondence between the Mercury realm and the Kerberos realm.
Session: While a user is engaged in a session with a Mercury realm, that user is able to receive messages within that realm, subscribe to groups within that realm, and, at the user's discretion, be located as having an active session with that realm from a given machine.
User: A user is a named entity which can receive personal Mercury messages and subscribe to Mercury groups.

Specification

This part specifies the Unix command-line user interface to the Mercury system. The purpose of this section is to specify what operations are supported by the Mercury system, and not to specify what the user interface must be. Other interfaces, following different philosophies, will have different characteristics.

hgc: The long-lived client process

A user begins communication with the Mercury subsystem by running hgc, a long-term client process which handles all communication with the Mercury servers in all realms on behalf of a user. hgc is responsible for carrying out commands given with hg, and is also responsible for receiving and displaying Mercury messages. Most users will not need to know about hgc, because it will be started for them by the system default startup scripts.

The hgc program maintains sessions with various Mercury realms. The hgc program will maintain state for which realms to maintain sessions with. If hgc is invoked with no existing state (e.g. for the first time), barring command-line options to the contrary, it should establish a session with a realm specified in a system configuration file.

The hgc command will support options for specifying the default realm and for specifying how to map Mercury realms onto machine names when the usual location mechanisms are not desired. The hgc command will also support extensive customizations for the processing and display of received messages. These are details not within the scope of this document.

hg: The short-lived command-line interface

When users wish to send messages or otherwise interact with the Mercury system, they can use the hg command. By default, hg commands operate on the current realm, but the user may override this by using the "-r realm" option. If the user specifies a realm which hgc can locate the servers in, but hgc does not have an active session with that realm, hgc should begin a session with that realm. The hg command supports the following requests:

hg allow {locate|track} [{track|locate}]

This command instructs the Mercury system to allow other users to locate or track the user, or both.

hg begin realm [realm...]

This command begins a session with the specified realms. This command is persistent across invocations of hgc via state stored in the user's home directory, in that hgc should remember which realms the user has issued begin commmands for, and should establish sessions with those realms each time it is invoked.

hg disallow {locate|track} [{track|locate}]

This command instructs the Mercury system to disallow other users from locating or tracking the user, or both.

hg end realm [realm...]

This command ends a session with the specified realms. This command is persistent across invocations of hgc, in that it removes the specified realms from the list of realms with which to establish sessions.

hg locate user [user...]

This command gives the location of one or more users. "locate" may be abbreviated to "loc".

hg quit

This command instructs hgc to terminate, thus closing all active Mercury sessions.

hg sendg group [group...]

This command prompts the user to enter a message and sends it to the members of one or more groups. Additional command-line flags will determine whether or not the message is to be authenticated and/or encrypted. Another command-line option will allow the user to specify a topic for the message in order to aid client filtering; this will be left blank by default.

hg sendu user [user...]

This command prompts the user to enter a message and sends the message to one or more users. Additional command-line flags will determine whether or not the message is to be authenticated and/or encrypted. "sendu" may be abbreviate to "send".

hg subscribe group [group...]

This command subscribes the user to one or more groups. Subscriptions are persistent across invocations of hgc, via state stored in the user's home directory. An "hg end" command for a realm destroys the subscription state for that realm. "subscribe" may be abbreviated to "sub".

hg track user [user...]

This command instructs the Mercury system to send a notice to the user whenever any of the users specified begin or end a Mercury session.

hg unsubscribe group [group...]

This command unsubscribes the user from one or more groups. "unsubscribe" may be appreviated to "unsub".

Design

This part describes the design of the Mercury system. [More intro.]

Services

The Mercury system consists of three cooperating services, named hgp, hgg, and hgl. These services support personal messages, group messages, and location information respectively. They are implemented by the server processes hgpd, hggd, and hgld.

At the beginning of a session, the hgc program should register with the hgp service to enable reception of personal messages, with the hgg service if the user is subscribed to any groups, and with the hgl service if the user wishes to allow other users to locate or track the user.

Communication Paths

In general, a user uses the hg program to communicate with the user's hgc process. The hgc process maintains sessions with Mercury servers. The following picture shows the communications paths between an hgc process, an hg client process, and the servers in a Mercury realm.

In the above picture, the hgl service is implemented with a single server machine, while the hgp and hgg services share two machines. Note that when services are replicated in this manner, the hgc process does not choose a single server to communicate with; they may communicate with any of the servers.

Distribution of Services

All of the Mercury services may be distributed across several servers. Each service has associated with it a distribution key which determines which server is appropriate for processing a particular kind of request. For hgl, the distribution key is the name of the user to be located or tracked. For hgp, the distribution key is the name of the user to which a personal message is to be sent. For hgg, the distribution key is the name of the group to which a personal message is to be sent.

Servers are distributed based on lexical divisions on the distribution key. Given an ordering of n servers, it is possible to describe the state of the server distribution by a record of n - 1 strings giving the lexically greatest distribution key appropriate for that server. Clients will cache distribution records for each service, and will receive updated distribution records when they send a request to the "wrong" server.

A server is responsible for storing the state related to all distribution keys appropriate for that server. In addition, each server is responsible for knowing the state related to the previous server's distribution keys. Each server will inform the next server of its state updates in packet-sized batches or when no state updates have occurred in a fixed, short length of time. If a server loses its state information and is restored quickly, it can query the next server for its state; if the server remains down, the other servers can redistribute the load among themselves. In any event, only those few requests which occurred just before the server loses its state information will be lost.

Each server will keep statistics on the requests it is receiving having to do with various distribution keys. Each hour, one server (the master server, the identity of which cycles from one server to the next each hour) will query the rest of the servers for the number of packets they have processed (both sent and received) in the last hour, and decide whether or not to redistribute load, changing the distribution record. Redistributing load is not to be done frivolously, because it will result in some clients sending a packet to the wrong server.

If the master server does decide to redistribute load, it will compute the average total number of packets sent and received. Then, going from the first server to the last, the master server will determine distribution key boundaries for each server, by adding up existing packet loads for servers and then querying some servers for one or more distribution keys which will yield close to correct splits in that server's load. The following diagram shows, with simplified numbers, how the master server would decide new distributions key boundaries for six servers with unbalanced loads. (It does not matter which of the six servers is the master server.)

For the situation in the diagram above, the master server would query the second and fifth servers for one key and the sixth server for two keys to divide their loads appropriate. One of the existing distribution keys can be reused because it is close enough to the right place in the existing load distribution to be used to separate the second and third servers.

Once the master server has built a new distribution record, [you're in a really hairy update situation. Make sure the same strategy--and code--can be used for dealing with a server that went down].

Libraries

The Mercury system is supported by three client libraries used to hide the details of communication between parts of the system. The libhg library contains routines for communication between Mercury clients and servers. The libhgserver library contains routines for communication between servers. The libhgc library contains routines to communicate on the local machine with the hgc client program.

libhg

The libhg library contains routines for communication between Mercury clients and the servers which implement the Mercury services.

Types

Hg_session

An Hg_session structure contains the state of a session with the servers of a given realm. It should be initialized with an hg_begin_session() call, cleaned up with an hg_end_session() call, and passed as an argument to all functions operating on that realm.

Hg_security

Hg_security is an enumeration type, used to determine the security level and algorithm used for a message. The possible security types are:

HG_SECURITY_NONE: Send the message in plain text with no authentication.
HG_SECURITY_KAUTH: Send the message in plain text with Kerberos authentication and an MD5 cryptographic checksum on the contents using the Kerberos session key.
HG_SECURITY_KDCRYPT: Send the message with Kerberos authentication, encrypted using DES and the Kerberos session key.


int hg_begin_session(const char *realm, Hg_security security,
		     Hg_session *session);

This function establishes a session with the servers in a Mercury realm. realm is the name of the realm to connect to, or NULL for the default realm. security determines the security procedure to apply to the initial startup requests. session should be a pointer to an Hg_session object, which will remain valid until the session is closed with an hg_end_session() call.

At the beginning of the session, the user can receive personal messages but cannot be located and is not subscribed to any groups.

hg_begin_session() returns 0 if successful, or one of the following error codes:

HG_ERROR_INVALID_SECURITY: security is not a valid security procedure.
HG_ERROR_REALM_EXISTS: The current program already has a session established with realm.
HG_ERROR_REFUSED: The hgp server refused to establish the connection. Server may refuse to establish connections if the initial requests are not authenticated.
HG_ERROR_TIMEOUT: The hgp server could not be contacted.


int hg_send_users(Hg_session *session, const char **users,
		  int num_users, const char *topic,
		  const char *message, Hg_security security);

This function sends a personal message with the body message and the topic topic to the users users in the realm given by session, using the security procedure given by security.

Upon success, hg_send_users() sets all num_users pointers in users to NULL and returns 0. Otherwise, hg_send_users() sets all of the pointers in users to zero which it successfully sent message to, and returns one of the following error codes:

HG_ERROR_INVALID_SESSION: session is not an established session.
HG_ERROR_INVALID_SECURITY: security is not a valid security procedure.
HG_ERROR_USER_UNREGISTERED: Users whose pointers are still non-null were not registered with the hgp server.
HG_ERROR_TIMEOUT: The hgp server could not be contacted; users whose pointers are still non-null may have received the message, but probably have not.


int hg_send_groups(Hg_session *session, const char *groups,
		   int num_groups, const char *topic,
		   const char *message, Hg_security security);

This function sends a personal message with the body message and the topic topic to the recipients of the gruops groups in the realm given by session, using the security procedure given by security.

Upon success, hg_send_groups() sets all num_groups pointers in groups to NULL and returns 0. Otherwise, hg_send_users() sets all of the pointers in groups to zero which it successfully sent message to, and returns one of the following error codes:

HG_ERROR_INVALID_SESSION: session is not an established session.
HG_ERROR_INVALID_SECURITY: security is not a valid security procedure.
HG_ERROR_GROUP_EMPTY: Groups whose pointers are still non-null had no subscribers.
HG_ERROR_REFUSED: The hgp server refused to deliver the message to the groups whose pointers are still non-null.
HG_ERROR_TIMEOUT: The hgg server could not be contacted; groups whose pointers are still non-null may have received the message, but probably have not.

Protocols

The protocols

Rationale

The Mercury messaging and location system is designed as a replacement for the Zephyr messaging system in use at MIT and other sites. Mercury is designed to correct several flaws in the Zephyr system which limit its scaleability and flexibility and increase its complexity. This part describes the deficiencies in the Zephyr system which Mercury is designed to correct, and then attempts to justify some of the individual decisions made in the design of Mercury.

Deficiencies in the Zephyr System

The concepts of "class" and "instance" behind the Zephyr system are difficult for new users to understand.
A Zephyr client is allowed to make any request to any server. Therefore, in order to implement the Zephyr service efficiently, all servers must know the state related to all queries, and all servers must see every state update request. This is a fundamental problem that limits the scaleability of the service: as you add more servers, you find tha all of the servers are doing the same work of processing state updates.
The Zephyr protocol and server implementation makes the assumption that every client machine has a "host manager", a rendezvous agent responsible for forwarding all packets from client to server. Because the host manager operates on a well-known port, this makes it impossible to test or use modifications to an important part of the Zephyr client system without having control of the machine. A separate process (zwgc) maintains each user's session with the Zephyr servers. zwgc comunicates directly with the servers for some operations, but communicates through the host manager for others.
The purpose of the host manager is to allow the short-lived command-line Zephyr clients to send notices to the servers without having to worry about locating available servers and retransmitting packets. It is simpler and more flexible for the servers to view each user's session as a single, long-lived connection; the short-lived command-line clients can communicate with the process that maintains that connection. In effect, Mercury's hgc combines the functionality of zhm and zwgc.
The Zephyr protocol does not allow users to efficiently communicate with a restricted group of other users; that is, the nature of Zephyr classes and instances does not allow users to control the people subscribed to them without having control of the Zephyr servers. By providing support for multiple recipients in the hgp protocol, Mercury allows users to efficiently send messages to explicitly enumerated lists of other users.
The Zephyr protocol uses "magic" class names and instance names to make requests to the server, overloading the fields used to determine the interest group of a message. This makes the protocol difficult to describe, understand, and cleanly implement.
The Zephyr protocol is not properly layered to allow for effective authentication, and does not allow for encryption. Mercury layers authentication data at the top of the packet, allowing for a wide range of possible authentication schemes.
The Zephyr protocol is not extensible; it is impossible to add new fields to Zephyr packets without changing the version number. Mercury allows new fields to be added which will be ignored by older clients and servers.
The Zephyr protocol does not allow the separation of the location service from the other services its provides, nor does it allow for the separation of the personal messaging service from the group messaging service.
The Zephyr library is complex and convoluted, especially in the realm of fragmented notices, and exposes its internals to the application programmer. This makes the library difficult to maintain.
The Zephyr library isn't designed to be easily made thread-safe.

Separation of Services

The division of the Mercury system into three services presents two disadvantages over the monolithic approach. First, clients must send three packets at startup time, to initiate their sessions with all three services, where Zephyr gets away with two and could get away with one. The division is also expensive in terms of packets required to obtain the distribution records for the three services. If the services are all implemented on the same server machine, that server will have to process more packets than it would if the protocol were monolithic. Second, the programmer who delves for the first time into the implementation behind the Mercury library abstraction is confronted with three separate but similar protocols. This will probably make the library implementation more difficult to understand at first.

The minimal addition to the overhead of beginning and ending a session is justified by the ability to separate the three Mercury services and implement them on different machines. It would be possible to design a protocol which allows for the services to be separated but does not incur the packet overhead at the beginning and end of a session, but it would be undesirably complicated. As for the complexity concerns, the simplicity of implementing each individual protocol and server balances out the complexity of having three different client-server protocols.

Simplification of Location Access Controls

Some users may find it an unpleasant aspect of Mercury's design that it does not support Zephyr's distinction between "realm" and "net" exposures. Users may allows themselves to be located or tracked by other users, or they may disallow this, but they may not restrict locations and trackings to users within their Kerberos realm. The designer felt that this distinction was not sufficiently useful to warrant its inclusion in the user interface: the distinction confuses some users who find themselves locatable by other users but not by services such as WWW gateways, and the distinction unnecessarily couples the user interface with the Kerberos authentication system.

Simplification of Groups

Another aspect of the Zephyr system which does not exist in Mercury is the concept of a class and instance. The Mercury system supports only the concept of a named "group", and a topic field on Mecury messages intended to support client filtering. The designer felt that the simpler concept of a group better represents the way people currently use Zephyr.

Most users of Zephyr at MIT subscribe either to all instances of a given class (e.g. class "sipb") using the wildcard subscription feature, or they subscribe to instances of class "message" such as the instance "white-magic". Either of these concepts can be represented as group names.

A few users subscribe to specific instances of a class, e.g. class "sipb", instance "www", but this is extremely rare, and the savings in sever packet forwarding allowed by this level of filtering do not justify the additional server complexity it results in. Some users also subscribe to all instances of class "message", so that they can see all public conversations, but this class of users is quite small, and social conventions (e.g. an informal registry of public groups) can adequately support this kind of usage.

MIT also uses Zephyr for two operational tasks: the forwarding of system log messages (class "syslog") and the sending of messages related to certain file servers (class "filsrv"). System log messages can be dealt with by personal messages (with an appropriate topic string) and by group names of the form "syslog.machinename". Class filsrv messages are so rare that it is not worth maintaining the state on the messaging servers of which users is interested in precisely which file server machines; instead, all users who use the file servers at all can be subscribed to a group "filsrv", and filsrv messages can be sent to the group filsrv with a topic string giving the machine name. Clients can keep track of whether users are actually interested in filsrv messages for a particular machine.

Distribution criteria and algorithm

The distribution criteria and distribution algorithm satisfy several important properties. First, the current state of the server distribution can be expressed in a short record (which will probably always fit into a single UDP packet), allowing it to be cached within clients. Second, clients will almost always be able to send requests to the correct server, so that the servers will spend very little of their time bouncing packets back to clients with a new distribution record. Third, no request except for a retrieval of all group or tracking subscriptions needs to go to all of the servers for a service, so increasing the number of servers should always be an effective response to rising load. Fourth, the distribution algorithm does not involve large amounts of communication between servers.

The one major count against the distribution criteria is that it is not generic; clients are exposed to the method by which servers determine which requests should go where. The designer does not anticipate that anyone will discover a significantly better method for distributing the Mercury services among several machines.