TY - JOUR
T1 - Deciphering highly similar multigene family transcripts from Iso-Seq data with IsoCon
AU - Sahlin, Kristoffer
AU - Tomaszkiewicz, Marta
AU - Makova, Kateryna D.
AU - Medvedev, Paul
N1 - Publisher Copyright:
© 2018, The Author(s).
PY - 2018/12/1
Y1 - 2018/12/1
N2 - A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.
AB - A significant portion of genes in vertebrate genomes belongs to multigene families, with each family containing several gene copies whose presence/absence, as well as isoform structure, can be highly variable across individuals. Existing de novo techniques for assaying the sequences of such highly-similar gene families fall short of reconstructing end-to-end transcripts with nucleotide-level precision or assigning alternatively spliced transcripts to their respective gene copies. We present IsoCon, a high-precision method using long PacBio Iso-Seq reads to tackle this challenge. We apply IsoCon to nine Y chromosome ampliconic gene families and show that it outperforms existing methods on both experimental and simulated data. IsoCon has allowed us to detect an unprecedented number of novel isoforms and has opened the door for unraveling the structure of many multigene families and gaining a deeper understanding of genome evolution and human diseases.
UR - http://www.scopus.com/inward/record.url?scp=85056075328&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056075328&partnerID=8YFLogxK
U2 - 10.1038/s41467-018-06910-x
DO - 10.1038/s41467-018-06910-x
M3 - Article
C2 - 30389934
AN - SCOPUS:85056075328
SN - 2041-1723
VL - 9
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 4601
ER -