It is well known that non-small cell lung cancer (NSCLC) is a heterogeneous group of diseases. Previous studies have demonstrated genetic variation among different ethnic groups in the epidermal growth factor receptor (EGFR) in NSCLC. Research by our group and others has recently shown a lower frequency of EGFR mutations in African Americans with NSCLC, as compared to their White counterparts. In this study, we use our original study data of EGFR pathway genetics in African American NSCLC as an example to illustrate that univariate analyses based on aggregation versus partition of data leads to contradictory results, in order to emphasize the importance of controlling statistical confounding. We further investigate analytic approaches in logistic regression for data with separation, as is the case in our example data set, and apply appropriate methods to identify predictors of EGFR mutation. Our simulation shows that with separated or nearly separated data, penalized maximum likelihood (PML) produces estimates with smallest bias and approximately maintains the nominal value with statistical power equal to or better than that from maximum likelihood and exact conditional likelihood methods. Application of the PML method in our example data set shows that race and EGFR-FISH are independently significant predictors of EGFR mutation.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Health Information Management