Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci

  • Eleftherios Bochalis
  • , Michail Patsakis
  • , Nikol Chantzi
  • , Ioannis Mouratidis
  • , Dionysios V. Chartoumpekis
  • , Ilias Georgakopoulos-Soares

Research output: Contribution to journalArticlepeer-review

Abstract

The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development. Here, we introduce taxonomic quasi-primes, peptide k-mer sequences that are exclusively present in a specific taxonomy and absent from all others. By analyzing 24,073 reference proteomes, we identified these unique peptides at the superkingdom, kingdom, and phylum ranks. These sequences exhibit remarkable uniqueness at six- and seven-amino-acid lengths. For instance, the seven-mer SAPNYCY is found in 98.11% of eukaryotic species, while being completely absent from archaeal, bacterial, and viral reference proteomes. Functional analysis demonstrated that taxonomic quasi-prime containing proteins are enriched for processes defining a lineage, such as synaptic signaling in Chordata. Structural analysis revealed that these peptides are preferentially located within proteins, participating directly in enzymatic active sites, mediating protein–protein interactions, and stabilizing ligand binding. Moreover, we show that in human proteins, highly conserved Chordata quasi-prime loci are 2.08-fold more likely to harbor pathogenic variants than surrounding regions, directly linking these evolutionary signatures to disease. This study establishes taxonomic quasi-primes as markers that illuminate evolutionary pathways and provide a powerful method for identifying functionally indispensable and disease-relevant loci, which warrant further therapeutic and diagnostic investigation.

Original languageEnglish (US)
Article numbere70241
JournalProtein Science
Volume34
Issue number9
DOIs
StatePublished - Sep 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology

Fingerprint

Dive into the research topics of 'Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci'. Together they form a unique fingerprint.

Cite this