Phenome-wide association studies on cardiovascular health and fatty acids considering phenotype quality control practices for epidemiological data

Kristin Passero, Xi He, Jiayan Zhou, Bertram Mueller-Myhsok, Marcus E. Kleber, Winfried Maerz, Molly A. Hall

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations


Phenome-wide association studies (PheWAS) allow agnostic investigation of common genetic variants in relation to a variety of phenotypes but preserving the power of PheWAS requires careful phenotypic quality control (QC) procedures. While QC of genetic data is well-defined, no established QC practices exist for multi-phenotypic data. Manually imposing sample size restrictions, identifying variable types/distributions, and locating problems such as missing data or outliers is arduous in large, multivariate datasets. In this paper, we perform two PheWAS on epidemiological data and, utilizing the novel software CLARITE (CLeaning to Analysis: Reproducibility-based Interface for Traits and Exposures), showcase a transparent and replicable phenome QC pipeline which we believe is a necessity for the field. Using data from the Ludwigshafen Risk and Cardiovascular (LURIC) Health Study we ran two PheWAS, one on cardiac-related diseases and the other on polyunsaturated fatty acids levels. These phenotypes underwent a stringent quality control screen and were regressed on a genome-wide sample of single nucleotide polymorphisms (SNPs). Seven SNPs were significant in association with dihomo-γ-linolenic acid, of which five were within fatty acid desaturases FADS1 and FADS2. PheWAS is a useful tool to elucidate the genetic architecture of complex disease phenotypes within a single experimental framework. However, to reduce computational and multiple-comparisons burden, careful assessment of phenotype quality and removal of low-quality data is prudent. Herein we perform two PheWAS while applying a detailed phenotype QC process, for which we provide a replicable pipeline that is modifiable for application to other large datasets with heterogenous phenotypes. As investigation of complex traits continues beyond traditional genome wide association studies (GWAS), such QC considerations and tools such as CLARITE are crucial to the in the analysis of non-genetic big data such as clinical measurements, lifestyle habits, and polygenic traits.

Original languageEnglish (US)
Pages (from-to)659-670
Number of pages12
JournalPacific Symposium on Biocomputing
Issue number2020
StatePublished - 2020
Event25th Pacific Symposium on Biocomputing, PSB 2020 - Big Island, United States
Duration: Jan 3 2020Jan 7 2020

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Computational Theory and Mathematics


Dive into the research topics of 'Phenome-wide association studies on cardiovascular health and fatty acids considering phenotype quality control practices for epidemiological data'. Together they form a unique fingerprint.

Cite this