TY - GEN
T1 - Profiler and compiler assisted adaptive I/O prefetching for shared storage caches
AU - Son, Seung Woo
AU - Kandemir, Mahmut
AU - Kolcu, Ibrahim
AU - Muralidhara, Sai Prashanth
AU - Ozturk, Ozcan
AU - Karakoy, Mustafa
PY - 2008
Y1 - 2008
N2 - I/O prefetching has been employed in the past as one of the mech- anisms to hide large disk latencies. However, I/O prefetching in parallel applications is problematic when multiple CPUs share the same set of disks due to the possibility that prefetches from different CPUs can interact on shared memory caches in the I/O nodes in complex and unpredictable ways. In this paper, we (i) quantify the impact of compiler-directed I/O prefetching - developed originally in the context of sequential execution - on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings benefits, its effectiveness reduces significantly as the number of CPUs is increased; (ii) identify inter-CPU misses due to harmful prefetches as one of the main sources for this re- duction in performance with the increased number of CPUs; and (iii) propose and experimentally evaluate a profiler and compiler assisted adaptive I/O prefetching scheme targeting shared storage caches. The proposed scheme obtains inter-thread data sharing information using profiling and, based on the captured data sharing patterns, divides the threads into clusters and assigns a separate (customized) I/O prefetcher thread for each cluster. In our approach, the compiler generates the I/O prefetching threads automatically. We implemented this new I/O prefetching scheme using a compiler and the PVFS file system running on Linux, and the empirical data collected clearly underline the importance of adapting I/O prefetching based on program phases. Specifically, our pro- posed scheme improves performance, on average, by 19.9%, 11.9% and 10.3% over the cases without I/O prefetching, with independent I/O prefetching (each CPU is performing compiler-directed I/O prefetching independently), and with one CPU prefetching (one CPU is reserved for prefetching on behalf of others), respectively, when 8 CPUs are used.
AB - I/O prefetching has been employed in the past as one of the mech- anisms to hide large disk latencies. However, I/O prefetching in parallel applications is problematic when multiple CPUs share the same set of disks due to the possibility that prefetches from different CPUs can interact on shared memory caches in the I/O nodes in complex and unpredictable ways. In this paper, we (i) quantify the impact of compiler-directed I/O prefetching - developed originally in the context of sequential execution - on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings benefits, its effectiveness reduces significantly as the number of CPUs is increased; (ii) identify inter-CPU misses due to harmful prefetches as one of the main sources for this re- duction in performance with the increased number of CPUs; and (iii) propose and experimentally evaluate a profiler and compiler assisted adaptive I/O prefetching scheme targeting shared storage caches. The proposed scheme obtains inter-thread data sharing information using profiling and, based on the captured data sharing patterns, divides the threads into clusters and assigns a separate (customized) I/O prefetcher thread for each cluster. In our approach, the compiler generates the I/O prefetching threads automatically. We implemented this new I/O prefetching scheme using a compiler and the PVFS file system running on Linux, and the empirical data collected clearly underline the importance of adapting I/O prefetching based on program phases. Specifically, our pro- posed scheme improves performance, on average, by 19.9%, 11.9% and 10.3% over the cases without I/O prefetching, with independent I/O prefetching (each CPU is performing compiler-directed I/O prefetching independently), and with one CPU prefetching (one CPU is reserved for prefetching on behalf of others), respectively, when 8 CPUs are used.
UR - http://www.scopus.com/inward/record.url?scp=63549141550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=63549141550&partnerID=8YFLogxK
U2 - 10.1145/1454115.1454133
DO - 10.1145/1454115.1454133
M3 - Conference contribution
AN - SCOPUS:63549141550
SN - 9781605582825
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 112
EP - 121
BT - PACT'08
T2 - 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008
Y2 - 25 October 2008 through 29 October 2008
ER -