CAREER: Model Selection for Semiparametric Regression Models in High Dimensional Modeling and its Oracle Properties

Project: Research project

Project Details


proposal: DMS 0348869

PI: Runze Li

institution: The Pennsylvania State University

Model Selection for Semiparametric Regression Models in High

Dimensional Modeling and its Oracle Properties


Model selection is fundamental to high-dimensional data analysis, and

semiparametric regression models are potentially useful for analysis of

high-dimensional data. Model selection for semiparametric regression

models consists of two components: model selection (such as choice of

smoothing parameters) for the nonparametric component, and variable selection

for the parametric portion. Traditional variable selection

schemes, such as the stepwise deletion and the best subset variable

selection, could be extended to semiparametric modeling, but they are

expensive in computation since they require the smoothing parameters to

be selected for each submodel. The objectives of this proposal are to

develop new widely applicable model selection procedures for three

classes of semiparametric models which provide a unified framework for

many existing semiparametric regression models in the literature. In this

proposal, the PI (a) studies the asymptotic behaviors of the proposed

estimators, (b) demonstrates how the rate of convergence of the resulting

estimator depends on the regularization parameter, (c) shows that the proposed

procedures perform as well as the oracle procedure in variable selection

for semiparametric regression models, and (d) addresses issues related to

implementation of the proposed procedures. The PI also examines finite

sample performance via extensive Monte Carlo simulation studies and

applies the proposed procedures to analysis of real data.

With modern data collection devices and vast data storage space, one can

easily collect high-dimensional data, such as biotech data, financial data,

satellite imagery and hyperspectral imagery. Analysis of high-dimensional

data poses many challenges for statisticians and is becoming the most

important research topic in statistics. This proposal (a) lays down

a well-grounded and comprehensive framework for model selection for

semiparametric regression modeling in high-dimensional data analysis,

(b) has significant impact on the future research of high-dimensional

statistical modeling, and (c) enhances significantly the availability

of statistical tools and software for high-dimensional statistical modeling.

The proposed work is incorporated into a new topic course from which

graduate students may directly benefit. The proposed work also

benefits a broad range of scientists and researchers in various fields, including

automotive engineering, medical studies, prevention studies, public health

and social sciences.

Effective start/end date7/1/046/30/11


  • National Science Foundation: $440,000.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.