TY - GEN
T1 - FPRA
T2 - 2021 IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2021
AU - Liu, Xiao
AU - Zhou, Minxuan
AU - Ausavarungnirun, Rachata
AU - Eilert, Sean
AU - Akel, Ameen
AU - Rosing, Tajana
AU - Narayanan, Vijaykrishnan
AU - Zhao, Jishen
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/7/26
Y1 - 2021/7/26
N2 - Emerging resistive memory (RRAM) based crossbar array is a promising technology to accelerate neural network applications. RRAM-based CNN accelerators support a high-degree of intra-layer and inter-layer parallelism. The intra-layer parallelism duplicates kernels for each network layer while the inter-layer parallelism allows execution of each layer when a portion of input data is available. However, previously proposed RRAM-based accelerators do not leverage data sharing between duplicate kernels leading to significant idleness of crossbar arrays during inference. This shared data creates data dependencies that stall the processing of the next layer in the pipeline. To address these issues, we propose Fine-grained Parallel RRAM Architecture (FPRA), a novel architectural design, to improve parallelism for pipeline-enabled RRAM-based accelerators. FPRA addresses the data sharing issue with kernel batching and data sharing aware memory. Kernel batching rearranges the layout of the kernels and minimizes the data dependencies created by the input shared data. The data sharing aware memory uniformly buffers the input and output data for each layer, efficiently dispatching data to duplicate kernels while reducing the amount of data transferred between layers. We evaluate FPRA on eight popular image recognition CNN models with various configurations in a cycle-accurate simulator. We find that FPRA manages to achieve 2.0 \times average latency speedup, and 2.1 \times average throughput increase, as compared to the state-of-the-art RRAM-based accelerators.
AB - Emerging resistive memory (RRAM) based crossbar array is a promising technology to accelerate neural network applications. RRAM-based CNN accelerators support a high-degree of intra-layer and inter-layer parallelism. The intra-layer parallelism duplicates kernels for each network layer while the inter-layer parallelism allows execution of each layer when a portion of input data is available. However, previously proposed RRAM-based accelerators do not leverage data sharing between duplicate kernels leading to significant idleness of crossbar arrays during inference. This shared data creates data dependencies that stall the processing of the next layer in the pipeline. To address these issues, we propose Fine-grained Parallel RRAM Architecture (FPRA), a novel architectural design, to improve parallelism for pipeline-enabled RRAM-based accelerators. FPRA addresses the data sharing issue with kernel batching and data sharing aware memory. Kernel batching rearranges the layout of the kernels and minimizes the data dependencies created by the input shared data. The data sharing aware memory uniformly buffers the input and output data for each layer, efficiently dispatching data to duplicate kernels while reducing the amount of data transferred between layers. We evaluate FPRA on eight popular image recognition CNN models with various configurations in a cycle-accurate simulator. We find that FPRA manages to achieve 2.0 \times average latency speedup, and 2.1 \times average throughput increase, as compared to the state-of-the-art RRAM-based accelerators.
UR - http://www.scopus.com/inward/record.url?scp=85114337244&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114337244&partnerID=8YFLogxK
U2 - 10.1109/ISLPED52811.2021.9502474
DO - 10.1109/ISLPED52811.2021.9502474
M3 - Conference contribution
AN - SCOPUS:85114337244
T3 - Proceedings of the International Symposium on Low Power Electronics and Design
BT - 2021 IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 July 2021 through 28 July 2021
ER -