TY - JOUR
T1 - Quasi-prime peptides
T2 - identification of the shortest peptide sequences unique to a species
AU - Mouratidis, Ioannis
AU - Chan, Candace S.Y.
AU - Chantzi, Nikol
AU - Tsiatsianis, Georgios Christos
AU - Hemberg, Martin
AU - Ahituv, Nadav
AU - Georgakopoulos-Soares, Ilias
N1 - Publisher Copyright:
© The Author(s) 2023.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - Determining the organisms present in a biosample has many important applications in agriculture, wildlife conservation, and healthcare. Here, we develop a universal fingerprint based on the identification of short peptides that are unique to a specific organism. We define quasi-prime peptides as sequences that are found in only one species, and we analyzed proteomes from 21 875 species, from viruses to humans, and annotated the smallest peptide kmer sequences that are unique to a species and absent from all other proteomes. We also perform simulations across all reference proteomes and observe a lower than expected number of peptide kmers across species and taxonomies, indicating an enrichment for nullpeptides, sequences absent from a proteome. For humans, we find that quasi-primes are found in genes enriched for specific gene ontology terms, including proteasome and ATP and GTP catalysis. We also provide a set of quasi-prime peptides for a number of human pathogens and model organisms and further showcase its utility via two case studies for Mycobacterium tuberculosis and Vibrio cholerae, where we identify quasi-prime peptides in two transmembrane and extracellular proteins with relevance for pathogen detection. Our catalog of quasi-prime peptides provides the smallest unit of information that is specific to a single organism at the protein level, providing a versatile tool for species identification.
AB - Determining the organisms present in a biosample has many important applications in agriculture, wildlife conservation, and healthcare. Here, we develop a universal fingerprint based on the identification of short peptides that are unique to a specific organism. We define quasi-prime peptides as sequences that are found in only one species, and we analyzed proteomes from 21 875 species, from viruses to humans, and annotated the smallest peptide kmer sequences that are unique to a species and absent from all other proteomes. We also perform simulations across all reference proteomes and observe a lower than expected number of peptide kmers across species and taxonomies, indicating an enrichment for nullpeptides, sequences absent from a proteome. For humans, we find that quasi-primes are found in genes enriched for specific gene ontology terms, including proteasome and ATP and GTP catalysis. We also provide a set of quasi-prime peptides for a number of human pathogens and model organisms and further showcase its utility via two case studies for Mycobacterium tuberculosis and Vibrio cholerae, where we identify quasi-prime peptides in two transmembrane and extracellular proteins with relevance for pathogen detection. Our catalog of quasi-prime peptides provides the smallest unit of information that is specific to a single organism at the protein level, providing a versatile tool for species identification.
UR - http://www.scopus.com/inward/record.url?scp=85160838979&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160838979&partnerID=8YFLogxK
U2 - 10.1093/nargab/lqad039
DO - 10.1093/nargab/lqad039
M3 - Article
C2 - 37101657
AN - SCOPUS:85160838979
SN - 2631-9268
VL - 5
JO - NAR Genomics and Bioinformatics
JF - NAR Genomics and Bioinformatics
IS - 2
M1 - lqad039
ER -