TY - GEN
T1 - Optimal kernel choice for large-scale two-sample tests
AU - Gretton, Arthur
AU - Sriperumbudur, Bharath
AU - Sejdinovic, Dino
AU - Strathmann, Heiko
AU - Balakrishnan, Sivaraman
AU - Pontil, Massimiliano
AU - Fukumizu, Kenji
PY - 2012
Y1 - 2012
N2 - Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.
AB - Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.
UR - http://www.scopus.com/inward/record.url?scp=84877753617&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877753617&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84877753617
SN - 9781627480031
T3 - Advances in Neural Information Processing Systems
SP - 1205
EP - 1213
BT - Advances in Neural Information Processing Systems 25
T2 - 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Y2 - 3 December 2012 through 6 December 2012
ER -