TY - GEN
T1 - Comparative study on subject classification of academic videos using noisy transcripts
AU - Chang, Hau Wen
AU - Kim, Hung Sik
AU - Li, Shuyang
AU - Lee, Jeongkyu
AU - Lee, Dongwon
PY - 2010
Y1 - 2010
N2 - With the advance of Web technologies, the number of "academic" videos available on the Web (e.g., online lectures, web seminars, conference presentations, or tutorial videos) has increased explosively. A fundamental task of managing such videos is to classify them into relevant subjects. For this task, most of current content providers rely on keywords to perform the classification, while active techniques for automatic video classification focus on utilizing multi-modal features. However, in our settings, we argue that both approaches are not sufficient to solve the problem effectively. Keywords based method is very limited in terms of accuracy, while features based one lacks semantics to represent academic subjects. Toward this problem, in this paper, we propose to transform the video subject classification problem into the text categorization problem by exploiting the extracted transcripts of videos. Using both real and synthesized data, (1) we extensively study the validity of the proposed idea, (2) we analyze the performance of different text categorization methods, and (3) we study the impact of various factors of transcripts such as quality and length towards academic video classification problem.
AB - With the advance of Web technologies, the number of "academic" videos available on the Web (e.g., online lectures, web seminars, conference presentations, or tutorial videos) has increased explosively. A fundamental task of managing such videos is to classify them into relevant subjects. For this task, most of current content providers rely on keywords to perform the classification, while active techniques for automatic video classification focus on utilizing multi-modal features. However, in our settings, we argue that both approaches are not sufficient to solve the problem effectively. Keywords based method is very limited in terms of accuracy, while features based one lacks semantics to represent academic subjects. Toward this problem, in this paper, we propose to transform the video subject classification problem into the text categorization problem by exploiting the extracted transcripts of videos. Using both real and synthesized data, (1) we extensively study the validity of the proposed idea, (2) we analyze the performance of different text categorization methods, and (3) we study the impact of various factors of transcripts such as quality and length towards academic video classification problem.
UR - http://www.scopus.com/inward/record.url?scp=79952044115&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952044115&partnerID=8YFLogxK
U2 - 10.1109/ICSC.2010.91
DO - 10.1109/ICSC.2010.91
M3 - Conference contribution
AN - SCOPUS:79952044115
SN - 9780769541549
T3 - Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010
SP - 67
EP - 72
BT - Proceedings - 2010 IEEE 4th International Conference on Semantic Computing, ICSC 2010
T2 - 4th IEEE International Conference on Semantic Computing, ICSC 2010
Y2 - 22 September 2010 through 24 September 2010
ER -