Statistical approach for automated weighting of datasets: Application to heat capacity data

S. Zomorodpoosh, B. Bocklund, A. Obaied, R. Otis, Z. K. Liu, I. Roslyakova

Research output: Contribution to journalArticlepeer-review

6 Scopus citations


An essential step in CALPHAD is assigning relative weights to different datasets, but there is no consensus as to the best approach regarding this issue. Currently, such an assignment of weights for experimental or first-principles data is performed manually based on the knowledge and experience of the modeler. Since the existing manual treatment is subjective and time consuming, manipulation of such data is rapidly advancing toward automated procedures through statistical and data mining tools. In the present study, we propose an automated approach to determine the weight of datasets based on the K-Fold Cross-Validation method, modified under the conditions that each fold is selected non-randomly and contains an unequal number of observations. This approach can be considered for researchers as a support tool to evaluate the reliability of each dataset involved in the CALPHAD modeling and quantify the impact of weighting by statistical analysis of the corresponding model. We demonstrate the efficacy of this method through the evaluation of heat capacity data of fcc nickel, hcp magnesium, and bcc iron.

Original languageEnglish (US)
Article number101994
JournalCalphad: Computer Coupling of Phase Diagrams and Thermochemistry
StatePublished - Dec 2020

All Science Journal Classification (ASJC) codes

  • General Chemistry
  • General Chemical Engineering
  • Computer Science Applications


Dive into the research topics of 'Statistical approach for automated weighting of datasets: Application to heat capacity data'. Together they form a unique fingerprint.

Cite this