TY - GEN
T1 - Local expert forest of score fusion for video event classification
AU - Liu, Jingchen
AU - McCloskey, Scott
AU - Liu, Yanxi
PY - 2012
Y1 - 2012
N2 - We address the problem of complicated event categorization from a large dataset of videos "in the wild", where multiple classifiers are applied independently to evaluate each video with a 'likelihood' score. The core contribution of this paper is a local expert forest model for meta-level score fusion for event detection under heavily imbalanced class distributions. Our motivation is to adapt to performance variations of the classifiers in different regions of the score space, using a divide-and-conquer technique. We propose a novel method to partition the likelihood-space, being sensitive to local label distributions in imbalanced data, and train a pair of locally optimized experts each time. Multiple pairs of experts based on different partitions ('trees') form a 'forest', balancing local adaptivity and over-fitting of the model. As a result, our model disregards classifiers in regions of the score space where their performance is bad, achieving both local source selection and fusion. We experiment with the TRECVID Multimedia Event Detection (MED) dataset, detecting 15 complicated events from around 34k video clips comprising more than 1000 hours, and demonstrate superior performance compared to other score-level fusion methods.
AB - We address the problem of complicated event categorization from a large dataset of videos "in the wild", where multiple classifiers are applied independently to evaluate each video with a 'likelihood' score. The core contribution of this paper is a local expert forest model for meta-level score fusion for event detection under heavily imbalanced class distributions. Our motivation is to adapt to performance variations of the classifiers in different regions of the score space, using a divide-and-conquer technique. We propose a novel method to partition the likelihood-space, being sensitive to local label distributions in imbalanced data, and train a pair of locally optimized experts each time. Multiple pairs of experts based on different partitions ('trees') form a 'forest', balancing local adaptivity and over-fitting of the model. As a result, our model disregards classifiers in regions of the score space where their performance is bad, achieving both local source selection and fusion. We experiment with the TRECVID Multimedia Event Detection (MED) dataset, detecting 15 complicated events from around 34k video clips comprising more than 1000 hours, and demonstrate superior performance compared to other score-level fusion methods.
UR - http://www.scopus.com/inward/record.url?scp=84867855153&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867855153&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-33715-4_29
DO - 10.1007/978-3-642-33715-4_29
M3 - Conference contribution
AN - SCOPUS:84867855153
SN - 9783642337147
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 397
EP - 410
BT - Computer Vision, ECCV 2012 - 12th European Conference on Computer Vision, Proceedings
T2 - 12th European Conference on Computer Vision, ECCV 2012
Y2 - 7 October 2012 through 13 October 2012
ER -