Abstract
Proteins are biomolecules of life. They fold into a great variety of three-dimensional (3D) shapes. Underlying these folding patterns are many recurrent structural fragments or building blocks (analogous to 'LEGO® bricks'). This paper reports an innovative statistical inference approach to discover a comprehensive dictionary of protein structural building blocks from a large corpus of experimentally determined protein structures. Our approach is built on the Bayesian and information-theoretic criterion of minimum message length. To the best of our knowledge, this work is the first systematic and rigorous treatment of a very important data mining problem that arises in the cross-disciplinary area of structural bioinformatics. The quality of the dictionary we find is demonstrated by its explanatory power - any protein within the corpus of known 3D structures can be dissected into successive regions assigned to fragments from this dictionary. This induces a novel one-dimensional representation of three-dimensional protein folding patterns, suitable for application of the rich repertoire of character-string processing algorithms, for rapid identification of folding patterns of newly-determined structures. This paper presents the details of the methodology used to infer the dictionary of building blocks, and is supported by illustrative examples to demonstrate its effectiveness and utility.
| Original language | English (US) |
|---|---|
| Article number | 6729603 |
| Pages (from-to) | 1091-1096 |
| Number of pages | 6 |
| Journal | Proceedings - IEEE International Conference on Data Mining, ICDM |
| DOIs | |
| State | Published - Dec 1 2013 |
| Event | 13th IEEE International Conference on Data Mining, ICDM 2013 - Dallas, TX, United States Duration: Dec 7 2013 → Dec 10 2013 |
All Science Journal Classification (ASJC) codes
- General Engineering
Fingerprint
Dive into the research topics of 'Statistical inference of protein "LEGO bricks"'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver