TY - GEN
T1 - Extending Action Recognition in the Compressed Domain
AU - Abrams, Samuel
AU - Narayanan, Vijaykrishnan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - As the internet continues to extend its reach into every facet of society, video is becoming one of the most common mediums for the communication of ideas and information. Working with video in the compressed domain saves resources by avoiding decompression and allows for faster processing due to smaller input streams where redundancy is avoided. Most prior work on compressed video processing has focused on MPEG-4 part-2 codec which is dated in part due to its non-optimized compression ratio. For example, H.264 has a compression ratio 2x greater than MPEG-4 part-2 codec with improved quality. Due to the increasing prevalence of the more effective H.264 codec for video content, designing a network to infer directly on H.264 compressed video is essential. Hence, we propose a new video analytics architecture that uses only two streams of data from the compressed domain as compared to the three or more generally used for compressed recognition. The proposed architecture, coined Extended Codec Recognition Network (ECRN), is the first approach to our knowledge to support action recognition on both MPEG4 part-2 and H.264 compressed video. It is computationally efficient and achieves competitive accuracy to methods performing recognition solely on MPEG-4 part-2 streams. The ability to achieve competitive accuracy using a modern video codec creates the potential to extend compressed action recognition to a wide range of applications.
AB - As the internet continues to extend its reach into every facet of society, video is becoming one of the most common mediums for the communication of ideas and information. Working with video in the compressed domain saves resources by avoiding decompression and allows for faster processing due to smaller input streams where redundancy is avoided. Most prior work on compressed video processing has focused on MPEG-4 part-2 codec which is dated in part due to its non-optimized compression ratio. For example, H.264 has a compression ratio 2x greater than MPEG-4 part-2 codec with improved quality. Due to the increasing prevalence of the more effective H.264 codec for video content, designing a network to infer directly on H.264 compressed video is essential. Hence, we propose a new video analytics architecture that uses only two streams of data from the compressed domain as compared to the three or more generally used for compressed recognition. The proposed architecture, coined Extended Codec Recognition Network (ECRN), is the first approach to our knowledge to support action recognition on both MPEG4 part-2 and H.264 compressed video. It is computationally efficient and achieves competitive accuracy to methods performing recognition solely on MPEG-4 part-2 streams. The ability to achieve competitive accuracy using a modern video codec creates the potential to extend compressed action recognition to a wide range of applications.
UR - http://www.scopus.com/inward/record.url?scp=85153896549&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85153896549&partnerID=8YFLogxK
U2 - 10.1109/VLSID57277.2023.00058
DO - 10.1109/VLSID57277.2023.00058
M3 - Conference contribution
AN - SCOPUS:85153896549
T3 - Proceedings of the IEEE International Conference on VLSI Design
SP - 246
EP - 251
BT - Proceedings - 36th International Conference on VLSI Design, VLSID 2023 - held concurrently with 22nd International Conference on Embedded Systems, ES 2023
PB - IEEE Computer Society
T2 - 36th International Conference on VLSI Design, VLSID 2023
Y2 - 8 January 2023 through 12 January 2023
ER -