TY - JOUR
T1 - Efficient estimation of population-level summaries in general semiparametric regression models
AU - Maity, Arnab
AU - Ma, Yanyuan
AU - Carroll, Raymond J.
N1 - Funding Information:
Arnab Maity is a Graduate Student (E-mail: [email protected]), Yanyuan Ma is Assistant Professor (E-mail: [email protected]), and Raymond J. Carroll is Distinguished Professor (E-mail: [email protected]), Department of Statistics, Texas A&M University, College Station, TX 77843. This work was supported by grants from the National Cancer Institute (CA57030 for AM and RJC; CA74552 for YM) and by the Texas A&M Center for Environmental and Rural Health via a grant from the National Institute of Environmental Health Sciences (P30-ES09106). The authors are grateful to Janet Tooze, Amy Subar, Victor Kipnis, and Douglas Midthune for introducing us to the problem of episodically consumed foods and for allowing us to use their data. The authors thank Naisyin Wang for reading the final manuscript and helping us with replies to a referee. Part of the original work of the last two authors originally occurred during a visit to the Centre of Excellence for Mathematics and Statistics of Complex Systems at the Australian National University, whose support they gratefully acknowledge. The authors especially wish to thank three referees, the associate editor, and the joint editor for helping turn the original submission into a publishable article. Their patience and many helpful suggestions are very greatly appreciated.
PY - 2007/3
Y1 - 2007/3
N2 - This article considers a wide class of semiparametric regression models in which interest focuses on population-level quantities that combine both the parametric and the nonparametric parts of the model. Special cases in this approach include generalized partially linear models, generalized partially linear single-index models, structural measurement error models, and many others. For estimating the parametric part of the model efficiently, profile likelihood kernel estimation methods are well established in the literature. Here our focus is on estimating general population-level quantities that combine the parametric and nonparametric parts of the model (e.g., population mean, probabilities, etc.). We place this problem in a general context, provide a general kernel-based methodology, and derive the asymptotic distributions of estimates of these population-level quantities, showing that in many cases the estimates are semiparametric efficient. For estimating the population mean with no missing data, we show that the sample mean is semiparametric efficient for canonical exponential families, but not in general. We apply the methods to a problem in nutritional epidemiology, where estimating the distribution of usual intake is of primary interest and semiparametric methods are not available. Extensions to the case of missing response data are also discussed.
AB - This article considers a wide class of semiparametric regression models in which interest focuses on population-level quantities that combine both the parametric and the nonparametric parts of the model. Special cases in this approach include generalized partially linear models, generalized partially linear single-index models, structural measurement error models, and many others. For estimating the parametric part of the model efficiently, profile likelihood kernel estimation methods are well established in the literature. Here our focus is on estimating general population-level quantities that combine the parametric and nonparametric parts of the model (e.g., population mean, probabilities, etc.). We place this problem in a general context, provide a general kernel-based methodology, and derive the asymptotic distributions of estimates of these population-level quantities, showing that in many cases the estimates are semiparametric efficient. For estimating the population mean with no missing data, we show that the sample mean is semiparametric efficient for canonical exponential families, but not in general. We apply the methods to a problem in nutritional epidemiology, where estimating the distribution of usual intake is of primary interest and semiparametric methods are not available. Extensions to the case of missing response data are also discussed.
UR - http://www.scopus.com/inward/record.url?scp=33947277465&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33947277465&partnerID=8YFLogxK
U2 - 10.1198/016214506000001103
DO - 10.1198/016214506000001103
M3 - Article
AN - SCOPUS:33947277465
SN - 0162-1459
VL - 102
SP - 123
EP - 139
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 477
ER -