Engineers often employ, formally or informally, multi-fidelity computational models to aid design decision making. For example, recently the idea of viewing design as a Sequential Decision Process (SDP) provides a formal framework of sequencing multi-fidelity models to realize computational gains in the design process. Efficiency is achieved in the SDP because dominated designs are removed using less expensive (low-fidelity) models before using higher-fidelity models with the guarantee the antecedent model only removes design solutions that are dominated when analyzed using more detailed, higher-fidelity models. The set of multi-fidelity models and discrete decision states result in a combinatorial combination of modeling sequences, some of which require significantly fewer model evaluations than others. It is desirable to optimally sequence models; however, the optimal modeling policy can not be determined at the onset of SDP because the computational cost and discriminatory power of executing all models on all designs is unknown. In this study, the model selection problem is formulated as a Markov Decision Process and a classical reinforcement learning, namely Q-learning, is investigated to obtain and follow an approximately optimal modeling policy. The outcome is a methodology able to learn efficient sequencing of models by estimating their computational cost and discriminatory power while analyzing designs in the tradespace throughout the design process. Through application to a design example, the methodology is shown to: 1) effectively identify the approximate optimal modeling policy, and 2) efficiently converge upon a choice set.