TY - GEN
T1 - Costco
T2 - 12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011
AU - Yan, Su
AU - Lee, Dongwon
AU - Wang, Alex Hai
PY - 2011
Y1 - 2011
N2 - Connectivity analysis of networked documents provides high quality link structure information, which is usually lost upon a content-based learning system. It is well known that combining links and content has the potential to improve text analysis. However, exploiting link structure is non-trivial because links are often noisy and sparse. Besides, it is difficult to balance the term-based content analysis and the link-based structure analysis to reap the benefit of both. We introduce a novel networked document clustering technique that integrates the content and link information in a unified optimization framework. Under this framework, a novel dimensionality reduction method called COntent & STructure COnstrained (Costco) Feature Projection is developed. In order to extract robust link information from sparse and noisy link graphs, two link analysis methods are introduced. Experiments on benchmark data and diverse real-world text corpora validate the effectiveness of proposed methods.
AB - Connectivity analysis of networked documents provides high quality link structure information, which is usually lost upon a content-based learning system. It is well known that combining links and content has the potential to improve text analysis. However, exploiting link structure is non-trivial because links are often noisy and sparse. Besides, it is difficult to balance the term-based content analysis and the link-based structure analysis to reap the benefit of both. We introduce a novel networked document clustering technique that integrates the content and link information in a unified optimization framework. Under this framework, a novel dimensionality reduction method called COntent & STructure COnstrained (Costco) Feature Projection is developed. In order to extract robust link information from sparse and noisy link graphs, two link analysis methods are introduced. Experiments on benchmark data and diverse real-world text corpora validate the effectiveness of proposed methods.
UR - http://www.scopus.com/inward/record.url?scp=79952265019&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952265019&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-19437-5_24
DO - 10.1007/978-3-642-19437-5_24
M3 - Conference contribution
AN - SCOPUS:79952265019
SN - 9783642194368
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 289
EP - 300
BT - Computational Linguistics and Intelligent Text Processing - 12th International Conference, CICLing 2011, Proceedings
Y2 - 20 February 2011 through 26 February 2011
ER -