High Dimensional Statistical Inference in Flexible Response Surface Models for Product Formulation

Project: Research project

Project Details


Over the last few decades, multicomponent drugs have proved beneficial in the treatment of the most severe of diseases, e.g., anti-Cancer agents or antiviral agents. More generally, a multicomponent product formulation problem requires the determination of not only the component proportions and their amounts (or doses) but also the manufacturing process conditions that accompany the production of such formulation. With the introduction of new regulations by the Food and Drug Administration, pharmaceutical companies gained additional flexibility to make changes in their formulations and process operations. A key guideline in these regulations is that companies must specify a design space for their product/process for approval, i.e., a region of formulation and process operating conditions that guarantees quality. How to determine such design space poses several statistical inference challenges and is the goal of this research. This research also has important broader impacts in Biology. Experiments in animals vary the components and amounts of diets and measure surrogates of 'fitness' and lifetime. These experiments have a similar structure as some of the experiments in drug development and they too require flexible models. Collaboration with researchers from the pharmaceutical sector and biologists in academia will consider these broader impacts.

The overall goal of this research is to contribute new statistical methodology for computing frequentist and bayesian regions for the location of optima of flexible, supervised learning models that will aid in defining a design for product formulation. The research will study methods for finding confidence regions for optima of nonparametric models which are functions in a high-dimensional space without recourse to any distributional assumption. A very general Reproducible Kernel Hilbert Space construction will be adopted, which will find design spaces based on many different models used in practice. This research will also extend the theory of bootstrapping functions of parameters to the vector function case. The research includes an investigation of the data depth notion used to generate valid, unbiased, and small high dimensional confidence regions for the parameters, which in turn will be used to find the desired design region.

Effective start/end date8/15/167/31/20


  • National Science Foundation: $270,568.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.