TY - JOUR
T1 - An automated framework for hypotheses generation using literature
AU - Abedi, Vida
AU - Zand, Ramin
AU - Yeasin, Mohammed
AU - Faisal, Fazle Elahi
N1 - Funding Information:
This work was supported by the Electrical and Computer Engineering Department and Bioinformatics Program at the University of Memphis, by the University of Tennessee Health Science Center (UTHSC), as well as by NSF grant NSF-IIS-0746790. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding institution.
PY - 2012
Y1 - 2012
N2 - Background: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds crisp semantic associations among entities of interest - that is a step towards bridging such gaps. Methodology. The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect crisp associations, and making assertions about entities (such as disease X is associated with a set of factors Z). Results: Pilot studies were performed using two diseases. A comparative analysis of the computed associations and assertions with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture crisp direct and indirect associations, and provide knowledge discovery on demand. Conclusions: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.
AB - Background: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds crisp semantic associations among entities of interest - that is a step towards bridging such gaps. Methodology. The proposed HGF shares similar end goals like the SWAN but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect crisp associations, and making assertions about entities (such as disease X is associated with a set of factors Z). Results: Pilot studies were performed using two diseases. A comparative analysis of the computed associations and assertions with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture crisp direct and indirect associations, and provide knowledge discovery on demand. Conclusions: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.
UR - http://www.scopus.com/inward/record.url?scp=84865434310&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865434310&partnerID=8YFLogxK
U2 - 10.1186/1756-0381-5-13
DO - 10.1186/1756-0381-5-13
M3 - Article
AN - SCOPUS:84865434310
SN - 1756-0381
VL - 5
JO - BioData Mining
JF - BioData Mining
IS - 1
M1 - 13
ER -