Simultaneous product attribute name and value extraction from web pages

Bo Wu, Xueqi Cheng, Yu Wang, Yan Guo, Linhai Song

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

Much work has been done in the area of templateindependent web data extraction. However, these approaches deal with the attribute value extraction and annotation either in separate phases or constrained to a predefined set of attributes which is highly ineffective. In this paper, we perform the attribute extraction and annotation simultaneously by extracting the attribute name and value pair at the same time. In our approach, we use a co-training algorithm with naive Bayesian classifier to identify the candidate attribute name and value pairs in the unlabeled pages. The candidate attribute name and value pairs are used to detect the specification block of the product in web pages. Finally, all the attribute name and value pairs in the specification block are discovered. We conduct experiments for three types of products and obtain a promising result.

Original languageEnglish (US)
Title of host publicationProceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009
Pages295-298
Number of pages4
DOIs
StatePublished - 2009
Event2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009 - Milano, Italy
Duration: Sep 15 2009Sep 18 2009

Publication series

NameProceedings - 2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009
Volume3

Other

Other2009 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2009
Country/TerritoryItaly
CityMilano
Period9/15/099/18/09

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Simultaneous product attribute name and value extraction from web pages'. Together they form a unique fingerprint.

Cite this