TY - GEN
T1 - Extracting author meta-data from web using visual features
AU - Zheng, Shuyi
AU - Zhou, Ding
AU - Li, Jia
AU - Giles, C. Lee
PY - 2007
Y1 - 2007
N2 - Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classification problem. A homepage can be treated as a group of information pieces which need to be classified to different fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual features of information pieces on a homepage should be sufficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature-space based classification. Experimental results demonstrate that utilizing visual features and applying the inter-fields probability model can significantly improve the extraction accuracy.
AB - Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classification problem. A homepage can be treated as a group of information pieces which need to be classified to different fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual features of information pieces on a homepage should be sufficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature-space based classification. Experimental results demonstrate that utilizing visual features and applying the inter-fields probability model can significantly improve the extraction accuracy.
UR - http://www.scopus.com/inward/record.url?scp=49549101342&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49549101342&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2007.59
DO - 10.1109/ICDMW.2007.59
M3 - Conference contribution
AN - SCOPUS:49549101342
SN - 0769530192
SN - 9780769530192
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 33
EP - 38
BT - ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
T2 - 17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
Y2 - 28 October 2007 through 31 October 2007
ER -