TY - GEN
T1 - Scientific data and document processing in ChemXSeer
AU - Mitra, Prasenjit
AU - Giles, C. Lee
AU - Sun, Bingjun
AU - Liu, Ying
AU - Jaiswal, Anuj R.
PY - 2008
Y1 - 2008
N2 - ChemXSeer is a digital library and a data repository for the chemistry domain. The data deposited into our repository is linked with digital documents to create aggregates of resources representing the links between the data and the articles in which the data is reported. ChemXSeer enables the user to annotate the data using a metadata capturing tool. The metadata is indexed and searched to return relevant datasets to the user. ChemXSeer extracts chemical formulae and chemical names, disambiguates them and indexes them to allow for domain-knowledge enhanced search capabilities. As search engines mature, we foresee such vertical search engines, employing domain-specific knowledge to perform information extraction and indexing, especially for scientific domains, become more popular. Though substantial research has been pursued on information extraction from text, extracting information from tables and figures has received little attention. In the ChemXSeer project, we are building tools that allow automatic extraction of tables and figures.
AB - ChemXSeer is a digital library and a data repository for the chemistry domain. The data deposited into our repository is linked with digital documents to create aggregates of resources representing the links between the data and the articles in which the data is reported. ChemXSeer enables the user to annotate the data using a metadata capturing tool. The metadata is indexed and searched to return relevant datasets to the user. ChemXSeer extracts chemical formulae and chemical names, disambiguates them and indexes them to allow for domain-knowledge enhanced search capabilities. As search engines mature, we foresee such vertical search engines, employing domain-specific knowledge to perform information extraction and indexing, especially for scientific domains, become more popular. Though substantial research has been pursued on information extraction from text, extracting information from tables and figures has received little attention. In the ChemXSeer project, we are building tools that allow automatic extraction of tables and figures.
UR - http://www.scopus.com/inward/record.url?scp=52449112951&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=52449112951&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:52449112951
SN - 9781577353614
T3 - AAAI Spring Symposium - Technical Report
SP - 51
EP - 56
BT - Semantic Scientific Knowledge Integration - Papers from the AAAI Spring Symposium, Technical Report
T2 - 2008 AAAI Spring Symposium
Y2 - 26 March 2008 through 28 March 2008
ER -