Identifying genetic associations with variability in metabolic health and blood count laboratory values: Diving into the quantitative traits by leveraging longitudinal data from an EHR

Shefali S. Verma, Anastasia M. Lucas, Daniel R. Lavage, Joseph B. Leader, Raghu Metpally, Sarathbabu Krishnamurthy, Frederick Dewey, Ingrid Borecki, Alexander Lopez, John Overton, John Penn, Jeffrey Reid, Sarah A. Pendergrass, Gerda Breitwieser, Marylyn D. Ritchie

Research output: Contribution to journalConference articlepeer-review

11 Scopus citations


A wide range of patient health data is recorded in Electronic Health Records (EHR). This data includes diagnosis, surgical procedures, clinical laboratory measurements, and medication information. Together this information reflects the patient’s medical history. Many studies have efficiently used this data from the EHR to find associations that are clinically relevant, either by utilizing International Classification of Diseases, version 9 (ICD-9) codes or laboratory measurements, or by designing phenotype algorithms to extract case and control status with accuracy from the EHR. Here we developed a strategy to utilize longitudinal quantitative trait data from the EHR at Geisinger Health System focusing on outpatient metabolic and complete blood panel data as a starting point. Comprehensive Metabolic Panel (CMP) as well as Complete Blood Counts (CBC) are parts of routine care and provide a comprehensive picture from high level screening of patients’ overall health and disease. We randomly split our data into two datasets to allow for discovery and replication. We first conducted a genome-wide association study (GWAS) with median values of 25 different clinical laboratory measurements to identify variants from Human Omni Express Exome beadchip data that are associated with these measurements. We identified 687 variants that associated and replicated with the tested clinical measurements at p<5x10 -08 . Since longitudinal data from the EHR provides a record of a patient’s medical history, we utilized this information to further investigate the ICD-9 codes that might be associated with differences in variability of the measurements in the longitudinal dataset. We identified low and high variance patients by looking at changes within their individual longitudinal EHR laboratory results for each of the 25 clinical lab values (thus creating 50 groups – a high variance and a low variance for each lab variable). We then performed a PheWAS analysis with ICD-9 diagnosis codes, separately in the high variance group and the low variance group for each lab variable. We found 717 PheWAS associations that replicated at a p-value less than 0.001. Next, we evaluated the results of this study by comparing the association results between the high and low variance groups. For example, we found 39 SNPs (in multiple genes) associated with ICD-9 250.01 (Type-I diabetes) in patients with high variance of plasma glucose levels, but not in patients with low variance in plasma glucose levels. Another example is the association of 4 SNPs in UMOD with chronic kidney disease in patients with high variance for aspartate aminotransferase (discovery p-value: 8.71x10 -09 and replication p-value: 2.03x10 -06 ). In general, we see a pattern of many more statistically significant associations from patients with high variance in the quantitative lab variables, in comparison with the low variance group across all of the 25 laboratory measurements. This study is one of the first of its kind to utilize quantitative trait variance from longitudinal laboratory data to find associations among genetic variants and clinical phenotypes obtained from an EHR, integrating laboratory values and diagnosis codes to understand the genetic complexities of common diseases.

Original languageEnglish (US)
Pages (from-to)533-544
Number of pages12
JournalPacific Symposium on Biocomputing
Issue number212679
StatePublished - 2017
Event22nd Pacific Symposium on Biocomputing, PSB 2017 - Kohala Coast, United States
Duration: Jan 4 2017Jan 8 2017

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Computational Theory and Mathematics


Dive into the research topics of 'Identifying genetic associations with variability in metabolic health and blood count laboratory values: Diving into the quantitative traits by leveraging longitudinal data from an EHR'. Together they form a unique fingerprint.

Cite this