Crowd-sourcing Web knowledge for metadata extraction

Zhaohui Wu, Wenyi Huang, Chen Liang, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

We explore a new metadata extraction framework without human annotators with the ground truth harvested from Web. A new training sample is selected based on not only the uncertainty and representativeness in the unlabeled pool, but also on its availability and credibility in Web knowledge bases. We construct a dataset of 4329 books with valid metadata and evaluate our approach using 5 Web book databases as oracles. Empirical results demonstrate its effectiveness and efficiency.

Original languageEnglish (US)
Title of host publication2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages141-144
Number of pages4
ISBN (Electronic)9781479955695
DOIs
StatePublished - Dec 1 2014
Event2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014 - London, United Kingdom
Duration: Sep 8 2014Sep 12 2014

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
Country/TerritoryUnited Kingdom
CityLondon
Period9/8/149/12/14

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Crowd-sourcing Web knowledge for metadata extraction'. Together they form a unique fingerprint.

Cite this