Abstract
The main contribution of the paper is to show that Gaussian sketching of a kernel-Gram matrix K yields an operator whose counterpart in an RKHS H, is a random projection operator-in the spirit of Johnson-Lindenstrauss (J-L) lemma. To be precise, given a random matrix Z with i.i.d. Gaussian entries, we show that a sketch ZK corresponds to a particular random operator in (infinite-dimensional) Hilbert space H that maps functions f 2 H to a low-dimensional space Rd, while preserving a weighted RKHS inner-product of the form (Equation presented), where Σ is the covariance operator induced by the data distribution. In particular, under similar assumptions as in kernel PCA (KPCA), or kernel kmeans (K-k-means), well-separated subsets of feature-space {K(·, x): x 2 X} remain well-separated after such operation, which suggests similar benefits as in KPCA and/or K-k-means, albeit at the much cheaper cost of a random projection. In particular, our convergence rates suggest that, given a large dataset (Equation presented) of size N, we can build the Gram matrix K on a much smaller subsample of size n ≪ N, so that the sketch ZK is very cheap to obtain and subsequently apply as a projection operator on the original data (Equation presented). We verify these insights empirically on synthetic data, and on real-world clustering applications.
Original language | English (US) |
---|---|
Pages (from-to) | 3928-3937 |
Number of pages | 10 |
Journal | Proceedings of Machine Learning Research |
Volume | 108 |
State | Published - 2020 |
Event | 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online Duration: Aug 26 2020 → Aug 28 2020 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability