The penn state computing condominium scheduling system

Pawan Agnihotri, Vijay K. Agarwala, Jeffrey J. Nucciarone, Kevin M. Morooney, Chita Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

The Penn State RS/6000 SP is a uniquely acquired and operated computing facility. This 143 CPU machine, centrally located and jointly owned, is a result of collaboration between academic departments, research groups, and the central academic computing facility. It is the largest on-campus resource at Penn State for meeting the high performance computing needs. Due to the joint ownership structure of the machine, the job scheduling requirements are significantly different from the usual methods of job processor allocation in distributed memory parallel machines. After several years of adapting different queuing systems, primarily the Distributed Queuing System, to our needs, it became obvious that the conventional scheduling systems did not serve the machine scheduling requirements unique to the Penn State SP. We concluded that a robust and easily configurable system needs to be developed to meet our unique needs. We have drawn inspiration from and modeled our system on EASY. As with EASY, we use the application programming interface of LoadLeveler to implement our scheduler. Our scheduler is named Penn State Condominium Scheduler (PSCS). PSCS does policy implementation and job execution on the machine is done by LoadLeveler. PSCS is written to facilitate easier configuration and administration. It does not have any processor architecture dependence. It is similar to the native scheduler in LoadLeveler in this regard. PSCS has incorporated three unique features: (i) node owner affinity which ensures fairness by allocation based on ownership, (ii) backfilling which ensures efficient utilization of resources, and (iii) affinity for services provided which ensures proper matching of jobs to the processors based on memory, software and other requirements. Jobs from users who own nodes in the SP complex have affinity to those particular processors owned by them. They also have preferences granted to them depending on their ownership level. Once the demand from the node owners is met, the next important goal is to keep the machine as fully occupied with running jobs as possible. This is accomplished by backfilling. This scheduler incorporates these features which are most important to successful implementation of multi-owner, centrally located, heterogeneous computing facilities.

Original languageEnglish (US)
Title of host publicationSC 1998 - Proceedings of the ACM/IEEE Conference on Supercomputing
PublisherAssociation for Computing Machinery
ISBN (Electronic)081868707X
DOIs
StatePublished - 1998
Event1998 ACM/IEEE Conference on Supercomputing, SC 1998 - Orlando, United States
Duration: Nov 7 1998Nov 13 1998

Publication series

NameProceedings of the International Conference on Supercomputing
Volume1998-November

Conference

Conference1998 ACM/IEEE Conference on Supercomputing, SC 1998
Country/TerritoryUnited States
CityOrlando
Period11/7/9811/13/98

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'The penn state computing condominium scheduling system'. Together they form a unique fingerprint.

Cite this