TY - GEN
T1 - A collective I/O scheme based on compiler analysis
AU - Kandemir, Mahmut Taylan
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2000.
PY - 2000
Y1 - 2000
N2 - Current approaches to parallel I/O demand extensive user effort to obtain acceptable performance. This is in part due to difficulties in understanding the characteristics of a wide variety of I/O devices and in part due to inherent complexity of I/O software. While parallel I/O systems provide users with environments where persistent datasets can be shared between parallel processors, the ultimate performance of I/O-intensive codes depends largely on the relation between data access patterns and storage patterns of data in files and on disks. In cases where access patterns and storage patterns match, we can exploit parallel I/O hardware by allowing each processor to perform independent parallel I/O. To handle the cases in which data access patterns and storage patterns do not match, several I/O optimization techniques have been developed in recent years. Collective I/O is such an optimization technique that enables each processor to do I/O on behalf of other processors if doing so improves the overall performance. While it is generally accepted that collective I/O and its variants can bring impressive improvements as far as the I/O performance is concerned, it is difficult for the programmer to use collective I/O in an optimal manner. In this paper, we propose and evaluate a compiler-directed collective I/O approach which detects the opportunities for collective I/O and inserts the necessary I/O calls in the code automatically. An important characteristic of the approach is that instead of applying collective I/O indiscriminately, it uses collective I/O selectively, only in cases where independent parallel I/O would not be possible. We have implemented the necessary algorithms in a source-to-source translator and within a stand-alone tool. Our experimental results demonstrate that our compiler-directed collective I/O scheme performs very well on different setups built using nine applications from several scientific benchmarks.
AB - Current approaches to parallel I/O demand extensive user effort to obtain acceptable performance. This is in part due to difficulties in understanding the characteristics of a wide variety of I/O devices and in part due to inherent complexity of I/O software. While parallel I/O systems provide users with environments where persistent datasets can be shared between parallel processors, the ultimate performance of I/O-intensive codes depends largely on the relation between data access patterns and storage patterns of data in files and on disks. In cases where access patterns and storage patterns match, we can exploit parallel I/O hardware by allowing each processor to perform independent parallel I/O. To handle the cases in which data access patterns and storage patterns do not match, several I/O optimization techniques have been developed in recent years. Collective I/O is such an optimization technique that enables each processor to do I/O on behalf of other processors if doing so improves the overall performance. While it is generally accepted that collective I/O and its variants can bring impressive improvements as far as the I/O performance is concerned, it is difficult for the programmer to use collective I/O in an optimal manner. In this paper, we propose and evaluate a compiler-directed collective I/O approach which detects the opportunities for collective I/O and inserts the necessary I/O calls in the code automatically. An important characteristic of the approach is that instead of applying collective I/O indiscriminately, it uses collective I/O selectively, only in cases where independent parallel I/O would not be possible. We have implemented the necessary algorithms in a source-to-source translator and within a stand-alone tool. Our experimental results demonstrate that our compiler-directed collective I/O scheme performs very well on different setups built using nine applications from several scientific benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=84958968232&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958968232&partnerID=8YFLogxK
U2 - 10.1007/3-540-40889-4_1
DO - 10.1007/3-540-40889-4_1
M3 - Conference contribution
AN - SCOPUS:84958968232
SN - 3540411852
SN - 9783540411857
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 1
EP - 15
BT - Languages, Compilers, and Run-Time Systems for Scalable Computers - 5th International Workshop, LCR 2000, Selected Papers
A2 - Dwarkadas, Sandhya
PB - Springer Verlag
T2 - 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, LCR 2000
Y2 - 25 May 2000 through 27 May 2000
ER -