Project Details
Description
proposal: DMS 0348869
PI: Runze Li
institution: The Pennsylvania State University
Model Selection for Semiparametric Regression Models in High
Dimensional Modeling and its Oracle Properties
Abstract
Model selection is fundamental to high-dimensional data analysis, and
semiparametric regression models are potentially useful for analysis of
high-dimensional data. Model selection for semiparametric regression
models consists of two components: model selection (such as choice of
smoothing parameters) for the nonparametric component, and variable selection
for the parametric portion. Traditional variable selection
schemes, such as the stepwise deletion and the best subset variable
selection, could be extended to semiparametric modeling, but they are
expensive in computation since they require the smoothing parameters to
be selected for each submodel. The objectives of this proposal are to
develop new widely applicable model selection procedures for three
classes of semiparametric models which provide a unified framework for
many existing semiparametric regression models in the literature. In this
proposal, the PI (a) studies the asymptotic behaviors of the proposed
estimators, (b) demonstrates how the rate of convergence of the resulting
estimator depends on the regularization parameter, (c) shows that the proposed
procedures perform as well as the oracle procedure in variable selection
for semiparametric regression models, and (d) addresses issues related to
implementation of the proposed procedures. The PI also examines finite
sample performance via extensive Monte Carlo simulation studies and
applies the proposed procedures to analysis of real data.
With modern data collection devices and vast data storage space, one can
easily collect high-dimensional data, such as biotech data, financial data,
satellite imagery and hyperspectral imagery. Analysis of high-dimensional
data poses many challenges for statisticians and is becoming the most
important research topic in statistics. This proposal (a) lays down
a well-grounded and comprehensive framework for model selection for
semiparametric regression modeling in high-dimensional data analysis,
(b) has significant impact on the future research of high-dimensional
statistical modeling, and (c) enhances significantly the availability
of statistical tools and software for high-dimensional statistical modeling.
The proposed work is incorporated into a new topic course from which
graduate students may directly benefit. The proposed work also
benefits a broad range of scientists and researchers in various fields, including
automotive engineering, medical studies, prevention studies, public health
and social sciences.
Status | Finished |
---|---|
Effective start/end date | 7/1/04 → 6/30/11 |
Funding
- National Science Foundation: $440,000.00