TY - JOUR
T1 - Automated analysis of immunosequencing datasets reveals novel immunoglobulin D genes across diverse species
AU - Bhardwaj, Vinnu
AU - Franceschetti, Massimo
AU - Rao, Ramesh
AU - Pevzner, Pavel A.
AU - Safonova, Yana
N1 - Publisher Copyright:
© 2020 Bhardwaj et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2020/4
Y1 - 2020/4
N2 - Immunoglobulin genes are formed through V(D)J recombination, which joins the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics focuses on finding alleles of germline genes across various patients. Although reconstruction of V and J genes is a well-studied problem, the more challenging task of reconstructing D genes remained open until the IgScout algorithm was developed in 2019. In this work, we address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction, apply it to hundreds of immunosequencing datasets from multiple species, and validate the newly inferred D genes by analyzing diverse whole genome sequencing datasets and haplotyping heterozygous V genes.
AB - Immunoglobulin genes are formed through V(D)J recombination, which joins the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics focuses on finding alleles of germline genes across various patients. Although reconstruction of V and J genes is a well-studied problem, the more challenging task of reconstructing D genes remained open until the IgScout algorithm was developed in 2019. In this work, we address limitations of IgScout by developing a probabilistic MINING-D algorithm for D gene reconstruction, apply it to hundreds of immunosequencing datasets from multiple species, and validate the newly inferred D genes by analyzing diverse whole genome sequencing datasets and haplotyping heterozygous V genes.
UR - http://www.scopus.com/inward/record.url?scp=85084199869&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084199869&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1007837
DO - 10.1371/journal.pcbi.1007837
M3 - Article
C2 - 32339161
AN - SCOPUS:85084199869
SN - 1553-734X
VL - 16
JO - PLoS computational biology
JF - PLoS computational biology
IS - 4
M1 - e1007837
ER -