Knowledge-based approaches frequently employ empirical relations to determine effective potentials for coarse-grained protein models directly from protein databank structures. Although these approaches have enjoyed considerable success and widespread popularity in computational protein science, their fundamental basis has been widely questioned. It is well established that conventional knowledge-based approaches do not correctly treat manybody correlations between amino acids. Moreover, the physical significance of potentials determined by using structural statistics from different proteins has remained obscure. In the present work, we address both of these concerns by introducing and demonstrating a theory for calculating transferable potentials directly from a databank of protein structures. This approach assumes that the databank structures correspond to representative configurations sampled from equilibrium solution ensembles for different proteins. Given this assumption, this physics-based theory exactly treats many-body structural correlations and directly determines the transferable potentials that provide a variationally optimized approximation to the free energy landscape for each protein. We illustrate this approach by first constructing a databank of protein structures using a model potential and then quantitatively recovering this potential from the structure databank. The proposed framework will clarify the assumptions and physical significance of knowledge-based potentials, allow for their systematic improvement, and provide new insight into many-body correlations and cooperativity in folded proteins.
|Number of pages
|Proceedings of the National Academy of Sciences of the United States of America
|Published - Nov 16 2010
All Science Journal Classification (ASJC) codes