TY - JOUR
T1 - Feature selection methods for optimal design of studies for developmental inquiry
AU - Brick, Timothy R.
AU - Koffer, Rachel E.
AU - Gerstorf, Denis
AU - Ram, Nilam
N1 - Publisher Copyright:
© The Author(s) 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Objectives: As diary, panel, and experience sampling methods become easier to implement, studies of development and aging are adopting more and more intensive study designs. However, if too many measures are included in such designs, interruptions for measurement may constitute a significant burden for participants. We propose the use of feature selection-a data-driven machine learning process-in study design and selection of measures that show the most predictive power in pilot data. Method: We introduce an analytical paradigm based on the feature importance estimation and recursive feature elimination with decision tree ensembles and illustrate its utility using empirical data from the German Socio-Economic Panel (SOEP). Results: We identified a subset of 20 measures from the SOEP data set that maintain much of the ability of the original data set to predict life satisfaction and health across younger, middle, and older age groups. Discussion: Feature selection techniques permit researchers to choose measures that are maximally predictive of relevant outcomes, even when there are interactions or nonlinearities. These techniques facilitate decisions about which measures may be dropped from a study while maintaining efficiency of prediction across groups and reducing costs to the researcher and burden on the participants.
AB - Objectives: As diary, panel, and experience sampling methods become easier to implement, studies of development and aging are adopting more and more intensive study designs. However, if too many measures are included in such designs, interruptions for measurement may constitute a significant burden for participants. We propose the use of feature selection-a data-driven machine learning process-in study design and selection of measures that show the most predictive power in pilot data. Method: We introduce an analytical paradigm based on the feature importance estimation and recursive feature elimination with decision tree ensembles and illustrate its utility using empirical data from the German Socio-Economic Panel (SOEP). Results: We identified a subset of 20 measures from the SOEP data set that maintain much of the ability of the original data set to predict life satisfaction and health across younger, middle, and older age groups. Discussion: Feature selection techniques permit researchers to choose measures that are maximally predictive of relevant outcomes, even when there are interactions or nonlinearities. These techniques facilitate decisions about which measures may be dropped from a study while maintaining efficiency of prediction across groups and reducing costs to the researcher and burden on the participants.
UR - http://www.scopus.com/inward/record.url?scp=85046261188&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046261188&partnerID=8YFLogxK
U2 - 10.1093/geronb/gbx008
DO - 10.1093/geronb/gbx008
M3 - Article
C2 - 28164232
AN - SCOPUS:85046261188
SN - 1079-5014
VL - 73
SP - 113
EP - 123
JO - Journals of Gerontology - Series B Psychological Sciences and Social Sciences
JF - Journals of Gerontology - Series B Psychological Sciences and Social Sciences
IS - 1
ER -