Abstract
The identification of succinct, universal fingerprints that enable the characterization of individual taxonomies can reveal insights into trait development. Here, we introduce taxonomic quasi-primes, peptide k-mer sequences that are exclusively present in a specific taxonomy and absent from all others. By analyzing 24,073 reference proteomes, we identified these unique peptides at the superkingdom, kingdom, and phylum ranks. These sequences exhibit remarkable uniqueness at six- and seven-amino-acid lengths. For instance, the seven-mer SAPNYCY is found in 98.11% of eukaryotic species, while being completely absent from archaeal, bacterial, and viral reference proteomes. Functional analysis demonstrated that taxonomic quasi-prime containing proteins are enriched for processes defining a lineage, such as synaptic signaling in Chordata. Structural analysis revealed that these peptides are preferentially located within proteins, participating directly in enzymatic active sites, mediating protein–protein interactions, and stabilizing ligand binding. Moreover, we show that in human proteins, highly conserved Chordata quasi-prime loci are 2.08-fold more likely to harbor pathogenic variants than surrounding regions, directly linking these evolutionary signatures to disease. This study establishes taxonomic quasi-primes as markers that illuminate evolutionary pathways and provide a powerful method for identifying functionally indispensable and disease-relevant loci, which warrant further therapeutic and diagnostic investigation.
| Original language | English (US) |
|---|---|
| Article number | e70241 |
| Journal | Protein Science |
| Volume | 34 |
| Issue number | 9 |
| DOIs | |
| State | Published - Sep 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
All Science Journal Classification (ASJC) codes
- Biochemistry
- Molecular Biology
Fingerprint
Dive into the research topics of 'Taxonomic quasi-primes: peptides charting lineage-specific adaptations and disease-relevant loci'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver