TY - GEN
T1 - The penn state computing condominium scheduling system
AU - Agnihotri, Pawan
AU - Agarwala, Vijay K.
AU - Nucciarone, Jeffrey J.
AU - Morooney, Kevin M.
AU - Das, Chita
N1 - Publisher Copyright:
© 1998 IEEE.
PY - 1998
Y1 - 1998
N2 - The Penn State RS/6000 SP is a uniquely acquired and operated computing facility. This 143 CPU machine, centrally located and jointly owned, is a result of collaboration between academic departments, research groups, and the central academic computing facility. It is the largest on-campus resource at Penn State for meeting the high performance computing needs. Due to the joint ownership structure of the machine, the job scheduling requirements are significantly different from the usual methods of job processor allocation in distributed memory parallel machines. After several years of adapting different queuing systems, primarily the Distributed Queuing System, to our needs, it became obvious that the conventional scheduling systems did not serve the machine scheduling requirements unique to the Penn State SP. We concluded that a robust and easily configurable system needs to be developed to meet our unique needs. We have drawn inspiration from and modeled our system on EASY. As with EASY, we use the application programming interface of LoadLeveler to implement our scheduler. Our scheduler is named Penn State Condominium Scheduler (PSCS). PSCS does policy implementation and job execution on the machine is done by LoadLeveler. PSCS is written to facilitate easier configuration and administration. It does not have any processor architecture dependence. It is similar to the native scheduler in LoadLeveler in this regard. PSCS has incorporated three unique features: (i) node owner affinity which ensures fairness by allocation based on ownership, (ii) backfilling which ensures efficient utilization of resources, and (iii) affinity for services provided which ensures proper matching of jobs to the processors based on memory, software and other requirements. Jobs from users who own nodes in the SP complex have affinity to those particular processors owned by them. They also have preferences granted to them depending on their ownership level. Once the demand from the node owners is met, the next important goal is to keep the machine as fully occupied with running jobs as possible. This is accomplished by backfilling. This scheduler incorporates these features which are most important to successful implementation of multi-owner, centrally located, heterogeneous computing facilities.
AB - The Penn State RS/6000 SP is a uniquely acquired and operated computing facility. This 143 CPU machine, centrally located and jointly owned, is a result of collaboration between academic departments, research groups, and the central academic computing facility. It is the largest on-campus resource at Penn State for meeting the high performance computing needs. Due to the joint ownership structure of the machine, the job scheduling requirements are significantly different from the usual methods of job processor allocation in distributed memory parallel machines. After several years of adapting different queuing systems, primarily the Distributed Queuing System, to our needs, it became obvious that the conventional scheduling systems did not serve the machine scheduling requirements unique to the Penn State SP. We concluded that a robust and easily configurable system needs to be developed to meet our unique needs. We have drawn inspiration from and modeled our system on EASY. As with EASY, we use the application programming interface of LoadLeveler to implement our scheduler. Our scheduler is named Penn State Condominium Scheduler (PSCS). PSCS does policy implementation and job execution on the machine is done by LoadLeveler. PSCS is written to facilitate easier configuration and administration. It does not have any processor architecture dependence. It is similar to the native scheduler in LoadLeveler in this regard. PSCS has incorporated three unique features: (i) node owner affinity which ensures fairness by allocation based on ownership, (ii) backfilling which ensures efficient utilization of resources, and (iii) affinity for services provided which ensures proper matching of jobs to the processors based on memory, software and other requirements. Jobs from users who own nodes in the SP complex have affinity to those particular processors owned by them. They also have preferences granted to them depending on their ownership level. Once the demand from the node owners is met, the next important goal is to keep the machine as fully occupied with running jobs as possible. This is accomplished by backfilling. This scheduler incorporates these features which are most important to successful implementation of multi-owner, centrally located, heterogeneous computing facilities.
UR - http://www.scopus.com/inward/record.url?scp=84888950626&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84888950626&partnerID=8YFLogxK
U2 - 10.1109/SC.1998.10002
DO - 10.1109/SC.1998.10002
M3 - Conference contribution
AN - SCOPUS:84888950626
T3 - Proceedings of the International Conference on Supercomputing
BT - SC 1998 - Proceedings of the ACM/IEEE Conference on Supercomputing
PB - Association for Computing Machinery
T2 - 1998 ACM/IEEE Conference on Supercomputing, SC 1998
Y2 - 7 November 1998 through 13 November 1998
ER -