Regression imputation optimizing sample size and emulation: Demonstrations and comparisons to prominent methods

Gary F. Templeton, Martin Kang, Nargess Tahmasbi

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Missing input values weaken the ability of information systems (IS) researchers to make calculations, thereby reducing effective sample sizes and statistical power. Such technical problems with data cascade into scientific limitations resulting in the neglect of social and economic issues. Therefore, extensive missing values in data forces researchers to make crucial decisions, such as whether to impute and if so, what strategy to use. This study presents a single imputation approach that integrates and extends best practices for mitigating the effects of missing values. Using an array of missing value situations, we illustrate the Regression Imputation Optimizing Sample Size and Emulation (RIOSSE) method. The approach involves the derivation of an imputation model for each low-sample variable that leverages information available in large-sample sized inputs within the same data source. RIOSSE derives imputation equations with two competing goals in mind: 1) statistical power and 2) emulation. Direct comparisons demonstrate that RIOSSE is superior to three prominent multiple imputation methods (K-Nearest Neighbor, missForest, and LASSO) in two criteria each for achieving statistical power (parsimoniousness and sample size) and emulation (predictiveness and content validity). Further, 5-fold cross validation validated the head-to-head goal criteria comparisons. The paper contributes 1) a description of the RIOSSE method, 2) new imputation performance metrics and visualizations, 3) comparisons of our proposed method to three prominent multiple imputation methods, and 4) specified imputation models for 30 commonly used inputs to firm performance calculations.

Original languageEnglish (US)
Article number113624
JournalDecision Support Systems
Volume151
DOIs
StatePublished - Dec 2021

All Science Journal Classification (ASJC) codes

  • Management Information Systems
  • Information Systems
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Regression imputation optimizing sample size and emulation: Demonstrations and comparisons to prominent methods'. Together they form a unique fingerprint.

Cite this