MIT Information Systems

Athena Longjobs Project


Summary The Athena Longjobs project attempts to solve a long-standing problem in the Athena computing environment, i.e. the requirement that users remain present at the console of an Athena workstation for the duration of a session, even if they need to run procedures of long duration with no user interaction. The goal is to deliver a solution which is consistent with the attributes of other Athena-based services, and make the service available to those courses and students with a demonstrable academic need.


Status The existing service is now officially supported. The service software is maintained by the Athena Unix team, and the server machines by Athena Server Operations. Users are supported by the Faculty Liaisons, Athena Consulting, and Athena User Accounts.


Milestones Delivery
  • 12/2002: Hand-off to support
  • 11/2002: Held second training session for consultants.
  • 08/2002: Completed documentation and web forms.
  • 08/2002: Held first training session for consultants.
  • 08/2002: Updated software on master.
  • 08/2002: Deployed new Linux slaves.
  • 08/2002: Updated SGI slaves to Athena 9.1.
  • 08/2002: Deployed Sun Netra X1 slaves, replacing Ultra 5's.
  • 07/2002: Completed port to Athena 9.1.
  • 04/2002: Obtained sponsor approval for purchase of 6 Sun Netra X1 servers to replace the current Ultra 5's for the fall term.
  • 04/2002: Completed write-up on determining charge rates for sponsored research use.
  • 03/2002: Identified that support of time adjustment factors, for slaves of significantly varying speeds, will probably require some future development work.
  • 03/2002: Obtained sponsor approval for purchase of 5 Linux slaves, and ordered machines.
  • 02/2002: Began test of service for Spring 2002 term, using latest software.
  • 02/2002: Completed a capacity analysis tool, to be used to determine whether a subscription request should be granted.
  • 02/2002: New wrapper script for performing common operations.
  • 02/2002: Decision to purchase and deploy Linux slaves.
  • 01/2002: Implemented changes to allow user accounts to be added directly to certain groups in the quota database on the master, so the administrator does not have to wait for the group feed from Moira.
  • 01/2002: Deployed disk mirroring on the master's critical partitions.
  • 12/2001: Completed job and queue status changes, so that users can view all queued jobs, but not group membership or other private information.
  • 12/2001: Implemented fully encrypted connections for client-to-server and server-to-server communications.
  • 12/2001: Completed Fall test; analyzed usage and feedback. Identified need for better documentation on testing scripts, job dependencies, and the machines in the slave pool. 11/2001: Heavy use of test service by participating class provides first real test under system load. 11/2001: Identified policy issues to be addressed; Owls agreed that it would be the right forum to present issues.
  • 10/2001: Transition to Delivery
Discovery


Future Work In addition to correcting any problems that might arise with the existing service, we will maintain the service by porting the software each year to the current Athena release, and by upgrading and replacing the existing server machines as needed.

We have also identified the following as areas for potential enhancement, which would require significant additional development effort:
  • Support machines of varying speed within the same platform type
  • Billing (for sponsored research use)
  • Automate handling of temporary disk space needs, e.g. via a special area in AFS, or a separate AFS cell
  • Upgrade software, e.g. to take advantage of added features in PBSPro
  • Replace the job scheduler for better handling of class reservations


Other Documents


Contact information Public discussion: longjobs@mit.edu
The longjobs team: longjobs-dev@mit.edu

mit
Last modified: Tue Jan 7 16:14:46 EST 2003