TY - GEN
T1 - Short-Circuiting Memory Traffic in Handheld Platforms
AU - Yedlapalli, Praveen
AU - Nachiappan, Nachiappan Chidambaram
AU - Soundararajan, Niranjan
AU - Sivasubramaniam, Anand
AU - Kandemir, Mahmut
AU - Das, Chitaranjan
PY - 2015/1/15
Y1 - 2015/1/15
N2 - Handheld devices are ubiquitous in today's world. With their advent, we also see a tremendous increase in device-user interactivity and real-time data processing needs. Media (audio/video/camera) and gaming use-cases are gaining substantial user attention and are defining product successes. The combination of increasing demand from these use-cases and having to run them at low power (from a battery) means that architects have to carefully study the applications and optimize the hardware and software stack together to gain significant optimizations. In this work, we study workloads from these domains and identify the memory subsystem (system agent) to be a critical bottleneck to performance scaling. We characterize the lifetime of the "frame-based" data used in these workloads through the system and show that, by communicating at frame granularity, we miss significant performance optimization opportunities, caused by large IP-to-IP data reuse distances. By carefully breaking these frames into sub-frames, while maintaining correctness, we demonstrate substantial gains with limited hardware requirements. Specifically, we evaluate two techniques, flow-buffering and IP-IP short-circuiting, and show that these techniques bring both power-performance benefits and enhanced user experience.
AB - Handheld devices are ubiquitous in today's world. With their advent, we also see a tremendous increase in device-user interactivity and real-time data processing needs. Media (audio/video/camera) and gaming use-cases are gaining substantial user attention and are defining product successes. The combination of increasing demand from these use-cases and having to run them at low power (from a battery) means that architects have to carefully study the applications and optimize the hardware and software stack together to gain significant optimizations. In this work, we study workloads from these domains and identify the memory subsystem (system agent) to be a critical bottleneck to performance scaling. We characterize the lifetime of the "frame-based" data used in these workloads through the system and show that, by communicating at frame granularity, we miss significant performance optimization opportunities, caused by large IP-to-IP data reuse distances. By carefully breaking these frames into sub-frames, while maintaining correctness, we demonstrate substantial gains with limited hardware requirements. Specifically, we evaluate two techniques, flow-buffering and IP-IP short-circuiting, and show that these techniques bring both power-performance benefits and enhanced user experience.
UR - http://www.scopus.com/inward/record.url?scp=84937697128&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937697128&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2014.60
DO - 10.1109/MICRO.2014.60
M3 - Conference contribution
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 166
EP - 177
BT - Proceedings - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014
PB - IEEE Computer Society
T2 - 47th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2014
Y2 - 13 December 2014 through 17 December 2014
ER -