Collaborative Research: CDS&E: Inverse Design of Sequence-Defined Macromolecules using Generative Deep Learning

Project: Research project

Project Details

Description

NONTECHNICAL SUMMARY: This award supports data-intensive and computational research and education aimed to design a special class of polymeric materials, based on long chain-like molecules, and other related materials that have a specified chemical sequence along the length of the molecule. DNA is a well-known example of these “sequence-defined macromolecules.” Due to these defined chemical sequences, the molecules can self-assemble into aggregates with precisely tailored structures and properties. These materials have many potential applications, such as tunable drug delivery for personalized medicine. However, the immense number of possible sequences (more possibilities than individual grains of sand on earth) makes it difficult to reliably predict how the chemical sequence leads to structure and, subsequently, properties. This project therefore aims to pioneer a new machine learning approach to model the relationship between chemical sequences and aggregates' resulting structure and properties. The primary goal is to demonstrate the capability to recommend chemical sequences that will yield desired characteristics on demand. In addition to training graduate and undergraduate students studying materials science, the investigators will also organize educational activities, such as technical workshops and online tutorials, to disseminate the scientific findings and promote data-driven soft material design among future STEM professionals.TECHNICAL SUMMARY: This award supports data-intensive and computational research and education aimed to develop a new computational framework for the inverse design of sequence-defined macromolecules whose self-assembled aggregates exhibit targeted morphologies and properties. Traditional polymer physics methods are ill-equipped to handle the relevant length scales involved (spanning from monomer sequence to large-scale aggregation), and the vast design space that is accessible through modern synthetic chemistry. The team aims to produce a computationally efficient method for the rational design of macromolecules that self-assemble into target morphologies that satisfy precise property requirements, circumventing trial-and-error methods. Generative machine learning will be leveraged to model the joint probability distribution between sequence, structure, and property, allowing for direct approximation – and thus sampling – of the inverse functions. Through explainable AI methods, fundamental insights will be gained into how information encoded in the monomer sequence governs the physical features of self-assembled aggregates. This will have implications in diverse applications such as protein aggregation, coatings, membranes, drug delivery, and peptide nanowires. Furthermore, systematic benchmarking will be performed to provide valuable insights regarding the efficacy of different design schemes. The project will thus provide a transferable strategy for inverse molecular design, applicable to a broad range of complex soft materials. Results will be disseminated to the broader community through an annual workshop and online tutorials, leveraging an existing network of partner institutions committed to data science education in the engineering and physical science disciplines.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date6/1/245/31/27

Funding

  • National Science Foundation: $277,930.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.