TY - GEN
T1 - Information extraction from nanotoxicity related publications
AU - Xiao, Lemin
AU - Tang, Kaizhi
AU - Liu, Xiong
AU - Yang, Hui
AU - Chen, Zheng
AU - Xu, Roger
PY - 2013
Y1 - 2013
N2 - High-quality experimental data are important when developing predictive models for studying nanomaterial environmental impact (NEI). Given that raw data from experimental laboratories and manufacturing workplaces are usually proprietary and small-scaled, extracting information from publications is an attractive alternative for collecting data. We developed an information extraction system that can extract useful information from full-text nanotoxicity related publications. This information extraction system consists of five components: raw data transformation into machine readable format, data preprocessing, ontology-based named entity recognition, rule-based numerical attribute extraction from both tables and unstructured text, and relation extraction among entities and attributes. The information extraction system is applied on a dataset made of 94 publications, and results in an acceptable accuracy. By storing extracted data into a table according to relations among the data, a dataset that can be used to predict nanomaterial environmental impact is obtained. Such a system is unique in current nanomaterial community, and can help nanomaterial scientists and practitioners quickly locate useful information they need without spending lots of time reading articles.
AB - High-quality experimental data are important when developing predictive models for studying nanomaterial environmental impact (NEI). Given that raw data from experimental laboratories and manufacturing workplaces are usually proprietary and small-scaled, extracting information from publications is an attractive alternative for collecting data. We developed an information extraction system that can extract useful information from full-text nanotoxicity related publications. This information extraction system consists of five components: raw data transformation into machine readable format, data preprocessing, ontology-based named entity recognition, rule-based numerical attribute extraction from both tables and unstructured text, and relation extraction among entities and attributes. The information extraction system is applied on a dataset made of 94 publications, and results in an acceptable accuracy. By storing extracted data into a table according to relations among the data, a dataset that can be used to predict nanomaterial environmental impact is obtained. Such a system is unique in current nanomaterial community, and can help nanomaterial scientists and practitioners quickly locate useful information they need without spending lots of time reading articles.
UR - http://www.scopus.com/inward/record.url?scp=84894564732&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84894564732&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2013.6732723
DO - 10.1109/BIBM.2013.6732723
M3 - Conference contribution
AN - SCOPUS:84894564732
SN - 9781479913091
T3 - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
SP - 25
EP - 30
BT - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
T2 - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Y2 - 18 December 2013 through 21 December 2013
ER -