A Robust and Efficient Statistical Framework for Handling Missing-Not-At-Random Data in Patient Reported Outcomes and Beyond

  • Zhao, Jiwei J. (PI)
  • Ma, Yanyuan (CoPI)
  • Bisson, Leslie L. (CoPI)

Project: Research project

Project Details


The patient-reported outcome (PRO), representing the status of the patient's health that comes directly from the patient without interpretation by the clinician or anyone else, has the unique feature of describing health status from the viewpoint of the patient; therefore, the PRO research holds great promise for informed clinical and policy decision-making, as well as for improving the quality and efficiency of healthcare. However, the quality and value of PRO is contingent on a number of factors, and one of them is the missing-not-at-random (MNAR) issue. For instance, patients might fail to fill in a depression survey because of their level of depression, or patients who are sicker may be less likely to complete a quality-of-life questionnaire. In general, these PROs are missing due to the patient's declining health status, but the extent of decline is not known because it is not observed; hence, these missing data are informative and are MNAR. Similar situations also appear in large-scale health surveys and electronic health records database. In this project, the PIs will study statistical methodology and computational algorithm for the MNAR issue in PRO as well as in other similar situations. The research product has the potential to be applied to various studies, such as Alzheimer's disease, mental health disorders, orthopedics, and pain research. The PIs will also engage in education at both disciplinary and interdisciplinary levels, with beneficiaries ranging from local high school students and undergraduates, to master and PhD students, and to biomedical investigators. The project will also provide research opportunities for postdoctoral scholars.

The overarching goal of this project is to establish a groundbreaking and translational statistical methodology framework including robust methods as well as efficient estimators, where the assumption on the missing data mechanism is imposed at a minimum level hence the developed methods can be applied with the largest flexibility. Motivated by the well-recognized fact that there is no adequate way to test the correctness of the missing data mechanism, the PIs will adopt the shadow variable approach to achieve the model identification and essentially make no further assumptions on the mechanism, thereby provide largest possible protection to model misspecification. The methodology is robust against the mechanism model misspecification by leveraging the model-based likelihood and its associated semiparametric structure. The statistical methods developed in this project will be implemented into efficient R packages and user-friendly interfaces for researchers whose primary goal is the analysis of missing data, especially MNAR data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Effective start/end date8/1/203/31/21


  • National Science Foundation: $366,166.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.