A method for model selection using reinforcement learning when viewing design as a sequential decision process

Jaskanwal P.S. Chhabra, Gordon P. Warn

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

In an emerging paradigm, design is viewed as a sequential decision process (SDP) in which mathematical models of increasing fidelity are used in a sequence to systematically contract sets of design alternatives. The key idea behind SDP is to sequence models of increasing fidelity to provide sequentially tighter bounds on the decision criteria thereby removing inefficient designs from the tradespace with the guarantee that the antecedent model only removes design solutions that are dominated when analyzed using the more detailed, high-fidelity model. In general, efficiency in the SDP is achieved by using less expensive (low-fidelity) models early in the design process, before using high-fidelity models later on in the process. However, the set of multi-fidelity models and discrete decision states result in a combinatorial combination of model sequences, some of which require significantly fewer model evaluations than others. Unfortunately, the optimal modeling policy can not be determined at the onset of the SDP because the computational costs of executing all models on all designs and the discriminatory power of the resulting bounds are unknown. In this paper, the model selection problem is formulated as a finite Markov decision process (MDP) and an online reinforcement learning (RL) algorithm, namely, Q-learning, is used to obtain and follow an approximately optimal modeling policy, thereby overcoming the optimal modeling policy limitation of the current SDP. The outcome is a Reinforcement Learning based Design (RL-D) methodology able to learn efficient sequencing of models from sample estimates of the computational cost and discriminatory power of different models while analyzing design alternatives in the tradespace throughout the design process. Through application to two different design examples, the RL-D is shown to (1) effectively identify the approximate optimal modeling policy and (2) efficiently converge upon a choice set.

Original languageEnglish (US)
Pages (from-to)1521-1542
Number of pages22
JournalStructural and Multidisciplinary Optimization
Volume59
Issue number5
DOIs
StatePublished - May 15 2019

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design
  • Control and Optimization

Fingerprint

Dive into the research topics of 'A method for model selection using reinforcement learning when viewing design as a sequential decision process'. Together they form a unique fingerprint.

Cite this