TY - JOUR
T1 - Universal Architectural Concepts Underlying Protein Folding Patterns
AU - Konagurthu, Arun S.
AU - Subramanian, Ramanan
AU - Allison, Lloyd
AU - Abramson, David
AU - Stuckey, Peter J.
AU - Garcia de la Banda, Maria
AU - Lesk, Arthur M.
N1 - Publisher Copyright:
© Copyright © 2021 Konagurthu, Subramanian, Allison, Abramson, Stuckey, Garcia de la Banda and Lesk.
PY - 2021/4/30
Y1 - 2021/4/30
N2 - What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.
AB - What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.
UR - http://www.scopus.com/inward/record.url?scp=85106057592&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106057592&partnerID=8YFLogxK
U2 - 10.3389/fmolb.2020.612920
DO - 10.3389/fmolb.2020.612920
M3 - Article
C2 - 33996891
AN - SCOPUS:85106057592
SN - 2296-889X
VL - 7
JO - Frontiers in Molecular Biosciences
JF - Frontiers in Molecular Biosciences
M1 - 612920
ER -