Exploiting Frame Similarity for Efficient Inference on Edge Devices

Ziyu Ying, Shulin Zhao, Haibo Zhang, Cyan Subhra Mishra, Sandeepa Bhuyan, Mahmut T. Kandemir, Anand Sivasubramaniam, Chita R. Das

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Deep neural networks (DNNs) are being widely used in various computer vision tasks as they can achieve very high accuracy. However, the large number of parameters employed in DNNs can result in long inference times for vision tasks, thus making it even more challenging to deploy them in the compute- and memory-constrained mobile/edge devices. To boost the inference of DNNs, some existing works employ compression (model pruning or quantization) or enhanced hardware. How-ever, most prior works focus on improving model structure and implementing custom accelerators. As opposed to the prior work, in this paper, we target the video data that are processed by edge devices, and study the similarity between frames. Based on that, we propose two runtime approaches to boost the performance of the inference process, while achieving high accuracy.Specifically, considering the similarities between successive video frames, we propose a frame-level compute reuse algorithm based on the motion vectors of each frame. With frame-level reuse, we are able to skip 53% of frames in inference with negligible overhead and remain within less than 1% mAP (accuracy) drop for the object detection task. Additionally, we implement a partial inference scheme to enable region/tile-level reuse. Our experiments on a representative mobile device (Pixel 3 Phone) show that the proposed partial inference scheme achieves 2 × speedup over the baseline approach that performs full inference on every frame. We integrate these two data reuse algorithms to accelerate the neural network inference and improve its energy efficiency. More specifically, for each frame in the video, we can dynamically select between (i) performing a full inference, (ii) performing a partial inference, or (iii) skipping the inference altogether. Our experimental evaluations using six different videos reveal that the proposed schemes are up to 80% (56% on average) energy efficient and 2.2× performance efficient compared to the conventional scheme, which performs full inference, while losing less than 2% accuracy. Additionally, the experimental analysis indicates that our approach outperforms the state-of-the-art work with respect to accuracy and/or performance/energy savings.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE 42nd International Conference on Distributed Computing Systems, ICDCS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1073-1084
Number of pages12
ISBN (Electronic)9781665471770
DOIs
StatePublished - 2022
Event42nd IEEE International Conference on Distributed Computing Systems, ICDCS 2022 - Bologna, Italy
Duration: Jul 10 2022Jul 13 2022

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2022-July

Conference

Conference42nd IEEE International Conference on Distributed Computing Systems, ICDCS 2022
Country/TerritoryItaly
CityBologna
Period7/10/227/13/22

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Exploiting Frame Similarity for Efficient Inference on Edge Devices'. Together they form a unique fingerprint.

Cite this