OreChem ChemXSeer: A semantic digital library for chemistry

Na Li, Leilei Zhu, Prasenjit Mitra, Karl Mueller, Eric Poweleit, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository ChemxSeer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE))1 standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents.

Original languageEnglish (US)
Title of host publicationJCDL'10 - Digital Libraries - 10 Years Past, 10 Years Forward, a 2020 Vision
Pages245-253
Number of pages9
DOIs
StatePublished - 2010
Event10th Annual Joint Conference on Digital Libraries, JCDL 2010 - Gold Coast, QLD, Australia
Duration: Jun 21 2010Jun 25 2010

Publication series

NameProceedings of the ACM International Conference on Digital Libraries

Other

Other10th Annual Joint Conference on Digital Libraries, JCDL 2010
Country/TerritoryAustralia
CityGold Coast, QLD
Period6/21/106/25/10

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'OreChem ChemXSeer: A semantic digital library for chemistry'. Together they form a unique fingerprint.

Cite this