TY - GEN
T1 - Conjugate gradient sparse solvers
T2 - 20th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2006
AU - Malkowski, Korad
AU - Lee, Ingyu
AU - Raghavan, Padma
AU - Irwin, Mary Jane
PY - 2006
Y1 - 2006
N2 - We characterize the performance and power attributes of the conjugate gradient (CG) sparse solver which is widely used in scientific applications. We use cycle-accurate simulations with SimpleScalar and Wattch, on a processor and memory architecture similar to the configuration of a node of the BlueGene/L. We first demonstrate that substantial power savings can be obtained without performance degradation if low power modes of caches can be utilized. We next show that if Dynamic Voltage Scaling (DVS) can be used, power and energy savings are possible, but these are realized only at the expense of performance penalties. We then consider two simple memory subsystem optimizations, namely memory and level-2 cache prefetching. We demonstrate that when DVS and low power modes of caches are used with these optimizations, performance can be improved significantly with reductions in power and energy. For example, execution time is reduced by 23%, power by 55% and energy by 65% in the final configuration at 500MHz relative to the original at 1GHz. We also use our codes and the CG NAS benchmark code to demonstrate that performance and power profiles can vary significantly depending on matrix properties and the level of code tuning. These results indicate that architectural evaluations can benefit if traditional benchmarks are augmented with codes more representative of tuned scientific applications.
AB - We characterize the performance and power attributes of the conjugate gradient (CG) sparse solver which is widely used in scientific applications. We use cycle-accurate simulations with SimpleScalar and Wattch, on a processor and memory architecture similar to the configuration of a node of the BlueGene/L. We first demonstrate that substantial power savings can be obtained without performance degradation if low power modes of caches can be utilized. We next show that if Dynamic Voltage Scaling (DVS) can be used, power and energy savings are possible, but these are realized only at the expense of performance penalties. We then consider two simple memory subsystem optimizations, namely memory and level-2 cache prefetching. We demonstrate that when DVS and low power modes of caches are used with these optimizations, performance can be improved significantly with reductions in power and energy. For example, execution time is reduced by 23%, power by 55% and energy by 65% in the final configuration at 500MHz relative to the original at 1GHz. We also use our codes and the CG NAS benchmark code to demonstrate that performance and power profiles can vary significantly depending on matrix properties and the level of code tuning. These results indicate that architectural evaluations can benefit if traditional benchmarks are augmented with codes more representative of tuned scientific applications.
UR - http://www.scopus.com/inward/record.url?scp=33847142869&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33847142869&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2006.1639595
DO - 10.1109/IPDPS.2006.1639595
M3 - Conference contribution
AN - SCOPUS:33847142869
SN - 1424400546
SN - 9781424400546
T3 - 20th International Parallel and Distributed Processing Symposium, IPDPS 2006
BT - 20th International Parallel and Distributed Processing Symposium, IPDPS 2006
PB - IEEE Computer Society
Y2 - 25 April 2006 through 29 April 2006
ER -