To: mbarker@MIT.EDU Cc: wdc@MIT.EDU, rbasch@MIT.EDU Date: Fri, 30 May 1997 18:11:09 EDT This is an initial list of possible requirements and issues to be considered for a "long jobs" (batch) system. First, the following are general considerations for a batch system: - interactive vs. batch only - shared file system vs. file staging - parallel execution - recreation of submit environment - job dependency - configurable dispatching factors: - system load - resources available and required: - CPU type - CPU load - memory - disk - load balancing - checkpoint - system - user - administration: - job monitoring - rescheduling - suspension/resumption - cancel - change priority (nice) - migration - runtime limits - support/track subprocesses - provide job statistics - fault tolerance - redundancy (i.e. no single point of failure) - security - authentication - authorization/access control - encryption - GUI/Web/command line client tools for submit, job status, modify/cancel - ease of use - cost - licensing - platform coverage The following are possible additional questions to be addressed for the Athena environment: - AFS shared file system support - Kerberos authentication - allow jobs > 1 day (i.e. more than Kerberos ticket limit) - submit for future execution - Job encryption - single- or multi-user execution systems - limits on what applications can be run, monitor usage - special authorization needs - sign-up - faculty control - need root access within job? - require password on submit? Bob