TY - GEN
T1 - Simultaneous product attribute name and value extraction from web pages
AU - Wu, Bo
AU - Cheng, Xueqi
AU - Wang, Yu
AU - Guo, Yan
AU - Song, Linhai
PY - 2009
Y1 - 2009
N2 - Much work has been done in the area of templateindependent web data extraction. However, these approaches deal with the attribute value extraction and annotation either in separate phases or constrained to a predefined set of attributes which is highly ineffective. In this paper, we perform the attribute extraction and annotation simultaneously by extracting the attribute name and value pair at the same time. In our approach, we use a co-training algorithm with naive Bayesian classifier to identify the candidate attribute name and value pairs in the unlabeled pages. The candidate attribute name and value pairs are used to detect the specification block of the product in web pages. Finally, all the attribute name and value pairs in the specification block are discovered. We conduct experiments for three types of products and obtain a promising result.
AB - Much work has been done in the area of templateindependent web data extraction. However, these approaches deal with the attribute value extraction and annotation either in separate phases or constrained to a predefined set of attributes which is highly ineffective. In this paper, we perform the attribute extraction and annotation simultaneously by extracting the attribute name and value pair at the same time. In our approach, we use a co-training algorithm with naive Bayesian classifier to identify the candidate attribute name and value pairs in the unlabeled pages. The candidate attribute name and value pairs are used to detect the specification block of the product in web pages. Finally, all the attribute name and value pairs in the specification block are discovered. We conduct experiments for three types of products and obtain a promising result.
UR - http://www.scopus.com/inward/record.url?scp=84863116052&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863116052&partnerID=8YFLogxK
U2 - 10.1109/WI-IAT.2009.286
DO - 10.1109/WI-IAT.2009.286
M3 - Conference contribution
AN - SCOPUS:84863116052
SN - 9780769538013
T3 - Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009
SP - 295
EP - 298
BT - Proceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009
T2 - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009
Y2 - 15 September 2009 through 18 September 2009
ER -