Summary |
The Athena Longjobs project attempts to solve a long-standing problem
in the Athena computing environment, i.e. the
requirement
that users remain present at the console of an Athena workstation for the
duration of a session, even if they need to run procedures of long
duration with no user interaction. The goal is to deliver a solution
which is consistent with the attributes of other Athena-based
services, and make the service available to those courses and students
with a demonstrable academic need.
|
|
Status |
The existing service is
now officially supported. The service software is maintained by the
Athena Unix team, and the server machines by Athena Server Operations.
Users are supported by the Faculty Liaisons, Athena Consulting, and
Athena User Accounts.
|
|
Milestones |
Delivery
- 12/2002: Hand-off to support
- 11/2002: Held second training session for consultants.
- 08/2002: Completed documentation and web forms.
- 08/2002: Held first training session for consultants.
- 08/2002: Updated software on master.
- 08/2002: Deployed new Linux slaves.
- 08/2002: Updated SGI slaves to Athena 9.1.
- 08/2002: Deployed Sun Netra X1 slaves, replacing Ultra 5's.
- 07/2002: Completed port to Athena 9.1.
- 04/2002: Obtained sponsor approval for purchase of 6 Sun Netra X1 servers
to replace the current Ultra 5's for the fall term.
- 04/2002: Completed write-up on determining charge rates for
sponsored research use.
- 03/2002: Identified that support of time adjustment factors, for
slaves of significantly varying speeds, will probably require some future
development work.
- 03/2002: Obtained sponsor approval for purchase of 5 Linux
slaves, and ordered machines.
- 02/2002: Began test of service
for Spring 2002 term, using latest software.
- 02/2002: Completed a capacity analysis tool, to be used to
determine whether a subscription request should be granted.
- 02/2002: New wrapper script for performing common operations.
- 02/2002: Decision to purchase and deploy Linux slaves.
- 01/2002: Implemented changes to allow user accounts to be added directly
to certain groups in the quota database on the master, so the
administrator does not have to wait for the group feed from Moira.
- 01/2002: Deployed disk mirroring on the master's critical
partitions.
- 12/2001: Completed job and queue status changes, so that users
can view all queued jobs, but not group membership or other private
information.
- 12/2001: Implemented fully encrypted connections for client-to-server
and server-to-server communications.
- 12/2001: Completed Fall test; analyzed usage and feedback. Identified need for better
documentation on testing scripts, job dependencies, and the machines
in the slave pool.
- 11/2001: Heavy use of test service by participating class provides
first real test under system load.
- 11/2001: Identified policy issues to be addressed; Owls agreed that it
would be the right forum to present issues.
- 10/2001: Transition to Delivery
Discovery
|
|
Future Work |
In addition to correcting any problems that might arise with the
existing service, we will maintain the service by porting the software each
year to the current Athena release, and by upgrading and replacing the
existing server machines as needed.
We have also identified the following as areas for potential enhancement,
which would require significant additional development effort:
- Support machines of varying speed within the same platform type
- Billing (for sponsored research use)
- Automate handling of temporary disk space needs, e.g. via a special
area in AFS, or a separate AFS cell
- Upgrade software, e.g. to take advantage of added features in
PBSPro
- Replace the job scheduler for better handling of class reservations
|
|
Other Documents |
|
|
Contact information |
Public discussion: longjobs@mit.edu
The longjobs team:
longjobs-dev@mit.edu
|