TY - JOUR

T1 - A look at multiplicity through misclassification

AU - Dasgupta, Nairanjana

AU - Genz, Alan

AU - Lazar, Nicole A.

N1 - Publisher Copyright:
© 2015, Indian Statistical Institute.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.

PY - 2016

Y1 - 2016

N2 - Multiplicity in large scale studies using, for example, microarray genomic data and functional neuroimaging data, has been an extensively researched topic in recent years. One option often used by researchers in practice is a “top r-table”, which involves ranking the hypotheses in some order (pvalues or test statistics) and reporting the top r results. This has immediate practical applications as what we have is a list of “interesting” results that are worth following up, irrespective of the actual p-value (adjusted or not). In this manuscript we take another look at multiplicity using top-tables. Our approach is intended to be a compromise between theory and practice. We look at the relationship between the probability of correct classification, which we call r-power (the units picked in the top-r table do indeed come from the alternative), and the value of r. We analytically define r-power in terms of order statistics and quantify the probability of correct classification. We use numerical integration to calculate r-power as a function of effect size, δ; the number of hypotheses tested, N; the number of hypotheses coming from the null, k; and r. Our results indicate that r-power is positively related to effect size, and negatively related to k/N. The relationship to r depends upon whether r < k. There are two possible uses of our results: based on a pre-chosen r-power we can calculate r and decide on the number of hypotheses to be followed up or if r is calculated using some other criterion we can use our method to calculate r-power in that context. We illustrate these ideas using examples from microarrays and functional magnetic resonance imaging data.

AB - Multiplicity in large scale studies using, for example, microarray genomic data and functional neuroimaging data, has been an extensively researched topic in recent years. One option often used by researchers in practice is a “top r-table”, which involves ranking the hypotheses in some order (pvalues or test statistics) and reporting the top r results. This has immediate practical applications as what we have is a list of “interesting” results that are worth following up, irrespective of the actual p-value (adjusted or not). In this manuscript we take another look at multiplicity using top-tables. Our approach is intended to be a compromise between theory and practice. We look at the relationship between the probability of correct classification, which we call r-power (the units picked in the top-r table do indeed come from the alternative), and the value of r. We analytically define r-power in terms of order statistics and quantify the probability of correct classification. We use numerical integration to calculate r-power as a function of effect size, δ; the number of hypotheses tested, N; the number of hypotheses coming from the null, k; and r. Our results indicate that r-power is positively related to effect size, and negatively related to k/N. The relationship to r depends upon whether r < k. There are two possible uses of our results: based on a pre-chosen r-power we can calculate r and decide on the number of hypotheses to be followed up or if r is calculated using some other criterion we can use our method to calculate r-power in that context. We illustrate these ideas using examples from microarrays and functional magnetic resonance imaging data.

UR - http://www.scopus.com/inward/record.url?scp=84986300584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84986300584&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84986300584

SN - 0972-7671

VL - 78B

SP - 96

EP - 118

JO - Sankhya: The Indian Journal of Statistics

JF - Sankhya: The Indian Journal of Statistics

ER -