TY - GEN
T1 - CritICs critiquing criticality in mobile apps
AU - Rengasamy, Prasanna Venkatesh
AU - Zhang, Haibo
AU - Zhao, Shulin
AU - Nachiappan, Nachiappan Chidambaram
AU - Sivasubramaniam, Anand
AU - Kandemir, Mahmut T.
AU - Das, Chita R.
N1 - Funding Information:
This work has been supported in part by NSF grants 1439021, 1629915, 1526750, 1629129, 1763681, 1409095, 1317560, 1320478, 182293, 162651 and 1439057
Funding Information:
ACKNOWLEDGMENT This work has been supported in part by NSF grants 1439021, 1629915, 1526750, 1629129, 1763681, 1409095, 1317560, 1320478, 182293, 162651 and 1439057, and a DARPA/SRC JUMP grant. We would also like to thank Jack Sampson for his feedback on this paper.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/12
Y1 - 2018/12/12
N2 - In this paper, we conduct a systematic analysis to show that existing CPU optimizations targeting scientific/server workloads are not always well suited for mobile apps. In particular, we observe that the well-known and very important concept of identifying and accelerating individual critical instructions in workloads such as SPEC, are not as effective for mobile apps. Several differences in mobile app characteristics including (i) dependencies between critical instructions interspersed with non-critical instructions in the dependence chain, (ii) temporal proximity of the critical instructions in the dynamic stream, and (iii) the bottleneck shifting to the front from the rear of the datapath pipeline, are key contributors to the ineffectiveness of traditional criticality based optimizations. Instead, we propose the concept of Critical Instruction Chains (CritICs)-which are short, critical and self contained sequences of instructions, for aggregate level optimization. With motivating results, we show that an offline profiler/analysis framework can easily identify these CritICs, and we propose a very simple software mechanism in the compiler that exploits ARM's 16-bit ISA format to nearly double the fetch bandwidth of these instructions. We have implemented this entire framework-both profiler and compiler passes, and evaluated its effectiveness for 10 popular apps from the Play Store. Experimental evaluations show that our approach is much more effective than two previously studied criticality optimizations, yielding a speedup of 12.65%, and energy savings of 15% in the CPU (translating to a system wide energy savings of 4.6%), requiring very little additional hardware support.
AB - In this paper, we conduct a systematic analysis to show that existing CPU optimizations targeting scientific/server workloads are not always well suited for mobile apps. In particular, we observe that the well-known and very important concept of identifying and accelerating individual critical instructions in workloads such as SPEC, are not as effective for mobile apps. Several differences in mobile app characteristics including (i) dependencies between critical instructions interspersed with non-critical instructions in the dependence chain, (ii) temporal proximity of the critical instructions in the dynamic stream, and (iii) the bottleneck shifting to the front from the rear of the datapath pipeline, are key contributors to the ineffectiveness of traditional criticality based optimizations. Instead, we propose the concept of Critical Instruction Chains (CritICs)-which are short, critical and self contained sequences of instructions, for aggregate level optimization. With motivating results, we show that an offline profiler/analysis framework can easily identify these CritICs, and we propose a very simple software mechanism in the compiler that exploits ARM's 16-bit ISA format to nearly double the fetch bandwidth of these instructions. We have implemented this entire framework-both profiler and compiler passes, and evaluated its effectiveness for 10 popular apps from the Play Store. Experimental evaluations show that our approach is much more effective than two previously studied criticality optimizations, yielding a speedup of 12.65%, and energy savings of 15% in the CPU (translating to a system wide energy savings of 4.6%), requiring very little additional hardware support.
UR - http://www.scopus.com/inward/record.url?scp=85060005773&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060005773&partnerID=8YFLogxK
U2 - 10.1109/MICRO.2018.00075
DO - 10.1109/MICRO.2018.00075
M3 - Conference contribution
AN - SCOPUS:85060005773
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 867
EP - 880
BT - Proceedings - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018
PB - IEEE Computer Society
T2 - 51st Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2018
Y2 - 20 October 2018 through 24 October 2018
ER -