TableSeer: Automatic table metadata extraction and searching in digital libraries

Ying Liu, Kun Bai, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

130 Scopus citations

Abstract

Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each query, table pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007
Subtitle of host publicationBuilding and Sustaining the Digital Environment
Pages91-100
Number of pages10
DOIs
StatePublished - 2007
Event7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment - Vancouver, BC, Canada
Duration: Jun 18 2007Jun 23 2007

Publication series

NameProceedings of the ACM International Conference on Digital Libraries

Other

Other7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment
Country/TerritoryCanada
CityVancouver, BC
Period6/18/076/23/07

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'TableSeer: Automatic table metadata extraction and searching in digital libraries'. Together they form a unique fingerprint.

Cite this