TY - GEN

T1 - Hypothesis testing using pairwise distances and associated kernels

AU - Sejdinovic, Dino

AU - Gretton, Arthur

AU - Sriperumbudur, Bharath

AU - Fukumizu, Kenji

PY - 2012

Y1 - 2012

N2 - We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

AB - We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

UR - http://www.scopus.com/inward/record.url?scp=84867127400&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867127400&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84867127400

SN - 9781450312851

T3 - Proceedings of the 29th International Conference on Machine Learning, ICML 2012

SP - 1111

EP - 1118

BT - Proceedings of the 29th International Conference on Machine Learning, ICML 2012

T2 - 29th International Conference on Machine Learning, ICML 2012

Y2 - 26 June 2012 through 1 July 2012

ER -