EAGER: Collaborative Research: MATDAT18 Type-I: Development of a machine learning framework to optimize ReaxFF force field parameters.

Project: Research project

Project Details



This award supports continued collaboration of materials researchers with data scientists kindled at the MATDAT18 Datathon event. Recent advancements in technological devices, such as smart phones, batteries, and solar cells, are consequences of the discovery and application of novel materials. Computer simulations of systems of atoms could be insightful in predicting and discovering new materials. Simulations based on quantum mechanics are computationally expensive and prohibitive for all but for systems of a few atoms. Simulations involving a much larger number of atoms can be done using molecular dynamics which utilizes models for the interactions between atoms. ReaxFF is one such interaction model which can also describe chemical bonding. Currently, more than a thousand academic groups and companies are using ReaxFF to model systems of atoms. It takes many parameters to fully specify a ReaxFF model. These parameters control the interactions between atoms and must be individually optimized for different types of materials. Due to the prohibitively large number of possible combinations of parameters, this optimization process is time consuming and complex, and consequently limits the applicability of ReaxFF. A procedure that can produce optimum parameter sets within a reasonable time will facilitate novel material research by accelerating the investigation of underlying physics and chemistry on the scale of atoms. Recent developments in machine learning are promising in terms of solving such high dimensional global optimization problems. The goal of this study is to develop a procedure that will enable fast and high-quality force field development using machine learning models and make this procedure accessible to all current and future ReaxFF users.

The results of this project can also be applied to other large-scale multi-objective optimization problems and can have impacts on many scientific disciplines that involve large and complex data. The developed machine learning code and optimization procedure will be shared with researchers through the Materials Computation Center at Penn State University and GitHub. Some outreach programs will be conducted for educating the next generation of materials scientists, data scientists and statisticians. The research teams will create diverse environments in their laboratories in terms of race, gender and national origin. The research will also provide an excellent opportunity to recruit students from underrepresented groups to participate in projects at the interface between materials science, data science, and statistics and is highly relevant to societal needs.


This award supports continued collaboration between a materials researcher and a data scientist kindled at the MATDAT18 Datathon event. ReaxFF is a commonly used reactive force field method, capable of simulating bond formation and dissociation in large atomistic systems. In order to reveal the physics behind these systems accurately by using the ReaxFF simulations, the force field parameters must be optimized for each different materials system, and the high-dimensional force field parameter landscape should be explored thoroughly during optimization. However, the large number of existing parameters limit the optimization stage of the force field development, as the conventional optimization approaches become time-consuming. This challenge can be resolved by the development of an efficient optimization framework. In this project, an efficient sequential optimization framework will be developed, including a 'minimum energy' sequential search and a novel 'divide-and-conquer' strategy for efficient Gaussian process modeling. This study will make ReaxFF force field development more practical, which will enable fast access to physics and chemistry in a wide range of material systems to enhance novel material design.

This project can serve is an example of how rigorous statistical/machine learning methods can be used to tackle important problems in materials science and engineering. The project may be transformative, as it can empower the atomistic-scale understanding of materials systems by using novel techniques in data science and machine learning. The developed iterative optimization procedure will be combined under Python programming language to facilitate implementation to commercial molecular dynamics packages. From a statistical point of view, the idea of divide-and-conquer and design-based subsample aggregation to reduce computational complexity of Gaussian process modeling is innovative. It can open a new path in statistics/data science with big data settings and can lead to advances in machine learning and optimization. The sequential optimization framework constructed for high-dimensional problems may open new avenues for studying problems with massive and complex input structure and energize both theoretical and applied research in statistics and machine learning.

The award is jointly funded through the Division of Materials Research and the Division of Mathematical Sciences in the Mathematical and Physical Sciences Directorate.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Effective start/end date10/1/189/30/20


  • National Science Foundation: $140,000.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.