TY - GEN
T1 - TableSeer
T2 - 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment
AU - Liu, Ying
AU - Bai, Kun
AU - Mitra, Prasenjit
AU - Giles, C. Lee
PY - 2007
Y1 - 2007
N2 - Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each query, table pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.
AB - Tables are ubiquitous in digital libraries. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed fashion. However, current search engines do not support table search. The difficulty of automatic extracting tables from un-tagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table search problem challenging. In this paper, we describe TableSeer, a search engine for tables. TableSeer crawls digital libraries, detects tables from documents, extracts tables metadata, indexes and ranks tables, and provides a user-friendly search interface. We propose an extensive set of medium-independent metadata for tables that scientists and other users can adopt for representing table information. In addition, we devise a novel page box-cutting method to improve the performance of the table detection. Given a query, TableSeer ranks the matched tables using an innovative ranking algorithm - TableRank. TableRank rates each query, table pair with a tailored vector space model and a specific term weighting scheme. Overall, TableSeer eliminates the burden of manually extract table data from digital libraries and enables users to automatically examine tables. We demonstrate the value of TableSeer with empirical studies on scientific documents.
UR - http://www.scopus.com/inward/record.url?scp=36348992621&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=36348992621&partnerID=8YFLogxK
U2 - 10.1145/1255175.1255193
DO - 10.1145/1255175.1255193
M3 - Conference contribution
AN - SCOPUS:36348992621
SN - 1595936440
SN - 9781595936448
T3 - Proceedings of the ACM International Conference on Digital Libraries
SP - 91
EP - 100
BT - Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007
Y2 - 18 June 2007 through 23 June 2007
ER -