TY - GEN
T1 - OreChem ChemXSeer
T2 - 10th Annual Joint Conference on Digital Libraries, JCDL 2010
AU - Li, Na
AU - Zhu, Leilei
AU - Mitra, Prasenjit
AU - Mueller, Karl
AU - Poweleit, Eric
AU - Giles, C. Lee
PY - 2010
Y1 - 2010
N2 - Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository ChemxSeer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE))1 standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents.
AB - Representing the semantics of unstructured scientific publications will certainly facilitate access and search and hopefully lead to new discoveries. However, current digital libraries are usually limited to classic flat structured metadata even for scientific publications that potentially contain rich semantic metadata. In addition, how to search the scientific literature of linked semantic metadata is an open problem. We have developed a semantic digital library oreChem ChemxSeer that models chemistry papers with semantic metadata. It stores and indexes extracted metadata from a chemistry paper repository ChemxSeer using "compound objects". We use the Open Archives Initiative Object Reuse and Exchange (OAI-ORE))1 standard to define a compound object that aggregates metadata fields related to a digital object. Aggregated metadata can be managed and retrieved easily as one unit resulting in improved ease-of-use and has the potential to improve the semantic interpretation of shared data. We show how metadata can be extracted from documents and aggregated using OAI-ORE. ORE objects are created on demand; thus, we are able to search for a set of linked metadata with one query. We were also able to model new types of metadata easily. For example, chemists are especially interested in finding information related to experiments in documents. We show how paragraphs containing experiment information in chemistry papers can be extracted and tagged based on a chemistry ontology with 470 classes, and then represented in ORE along with other document-related metadata. Our algorithm uses a classifier with features that are words that are typically only used to describe experiments, such as "apparatus", "prepare", etc. Using a dataset comprised of documents from the Royal Society of Chemistry digital library, we show that the our proposed method performs well in extracting experiment-related paragraphs from chemistry documents.
UR - http://www.scopus.com/inward/record.url?scp=77955100039&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955100039&partnerID=8YFLogxK
U2 - 10.1145/1816123.1816160
DO - 10.1145/1816123.1816160
M3 - Conference contribution
AN - SCOPUS:77955100039
SN - 9781450300858
T3 - Proceedings of the ACM International Conference on Digital Libraries
SP - 245
EP - 253
BT - JCDL'10 - Digital Libraries - 10 Years Past, 10 Years Forward, a 2020 Vision
Y2 - 21 June 2010 through 25 June 2010
ER -