TY - GEN
T1 - Visual co-occurrence network
T2 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia, ESTIMedia 2015
AU - Advani, Siddharth
AU - Smith, Brigid
AU - Tanabe, Yasuki
AU - Irick, Kevin
AU - Cotter, Matthew
AU - Sampson, Jack
AU - Narayanan, Vijaykrishnan
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/12/9
Y1 - 2015/12/9
N2 - In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.
AB - In any visual object recognition system, the classification accuracy will likely determine the usefulness of the system as a whole. In many real-world applications, it is also important to be able to recognize a large number of diverse objects for the system to be robust enough to handle the sort of tasks that the human visual system handles on an average day. These objectives are often at odds with performance, as running too large of a number of detectors on any one scene will be prohibitively slow for use in any real-time scenario. However, visual information has temporal and spatial context that can be exploited to reduce the number of detectors that need to be triggered at any given instance. In this paper, we propose a dynamic approach to encode such context, called Visual Co-occurrence Network (ViCoNet) that establishes relationships between objects observed in a visual scene. We investigate the utility of ViCoNet when integrated into a vision pipeline targeted for retail shopping. When evaluated on a large and deep dataset, we achieve a 50% improvement in performance and a 7% improvement in accuracy in the best case, and a 45% improvement in performance and a 3% improvement in accuracy in the average case over an established baseline. The memory overhead of ViCoNet is around 10KB, highlighting its effectiveness on temporal big data.
UR - http://www.scopus.com/inward/record.url?scp=84962295258&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962295258&partnerID=8YFLogxK
U2 - 10.1109/ESTIMedia.2015.7351774
DO - 10.1109/ESTIMedia.2015.7351774
M3 - Conference contribution
AN - SCOPUS:84962295258
T3 - ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia
BT - ESTIMedia 2015 - 13th IEEE Symposium on Embedded Systems for Real-Time Multimedia
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 8 October 2015 through 9 October 2015
ER -