In alignments of 1969 protein sequences the amino acid glycine and others were found concentrated at most-conserved sites within ~15 Å of catalytic/active centers (C/AC) of highly conserved kinases, dehydrogenases or lyases of Archaea, Bacteria and Eukaryota. Lysine and glutamic acid were concentrated at least-conserved sites furthest from their C/ACs. Logistic-regression analyses corroborated the "movement" of glycine towards and lysine away from their C/ACs: the odds of a glycine occupying a site were decreased by 19%, while the odds for a lysine were increased by 53%, for every 10 Å moving away from the C/AC. Average conservation of MSA consensus sites was highest surrounding the C/AC and directly decreased in transition toward model's peripheries. Findings held with statistical confidence using sequences restricted to individual Domains or enzyme classes or to both. Our data describe variability in the rate of mutation and likelihoods for phylogenetic trees based on protein sequence data and endorse the extension of substitution models by incorporating data on conservation and distance to C/ACs rather than only using cumulative levels. The data support the view that in the most-conserved environment immediately surrounding the C/AC of taxonomically distant and highly conserved essential enzymes of central metabolism there are amino acids whose identity and degree of occupancy is similar to a proposed amino acid set and frequency associated with prebiotic evolution.
All Science Journal Classification (ASJC) codes
- Ecology, Evolution, Behavior and Systematics
- Space and Planetary Science