Informatics and machine learning to define the phenotype

Anna Okula Basile, Marylyn De Riggi Ritchie

Research output: Contribution to journalReview articlepeer-review

32 Citations (SciVal)


Introduction: For the past decade, the focus of complex disease research has been the genotype. From technological advancements to the development of analysis methods, great progress has been made. However, advances in our definition of the phenotype have remained stagnant. Phenotype characterization has recently emerged as an exciting area of informatics and machine learning. The copious amounts of diverse biomedical data that have been collected may be leveraged with data-driven approaches to elucidate trait-related features and patterns. Areas covered: In this review, the authors discuss the phenotype in traditional genetic associations and the challenges this has imposed.Approaches for phenotype refinement that can aid in more accurate characterization of traits are also discussed. Further, the authors highlight promising machine learning approaches for establishing a phenotype and the challenges of electronic health record (EHR)-derived data. Expert commentary: The authors hypothesize that through unsupervised machine learning, data-driven approaches can be used to define phenotypes rather than relying on expert clinician knowledge. Through the use of machine learning and an unbiased set of features extracted from clinical repositories, researchers will have the potential to further understand complex traits and identify patient subgroups. This knowledge may lead to more preventative and precise clinical care.

Original languageEnglish (US)
Pages (from-to)219-226
Number of pages8
JournalExpert Review of Molecular Diagnostics
Issue number3
StatePublished - Mar 4 2018

All Science Journal Classification (ASJC) codes

  • Pathology and Forensic Medicine
  • Molecular Medicine
  • Molecular Biology
  • Genetics


Dive into the research topics of 'Informatics and machine learning to define the phenotype'. Together they form a unique fingerprint.

Cite this