Abstract
In this paper, we deal with modelling or extracting information from an unlabelled data sample. In many real world applications appropriate preprocessing transformations of high dimensional input data can increase overall performance of algorithms. Feature extraction tries to find a compact description of the interesting features of the data. This can be useful for visualization of higher dimensional data in two or three dimensions or for data compression. It can also be applied as a preprocessing step that enables reducing the dimension of the data to be handled by a subsequent model. In this paper, we mainly concentrate on kernel PCA for feature selection in a higher dimensional feature space. We first introduce the usefulness of EM algorithm for standard PCA. We then present the kernel PCA. Kernel PCA is a nonlinear extension of PCA based on the kernel transformation (Scholkopf, Smola, and Muller 1997). It requires the eigenvalue decomposition of a so-called kernel matrix of size N×N. In this contribution we propose an expectation maximization approach for performing kernel principal component analysis. Moreover we will introduce an online algorithm of EM for PCA. We show this to be a computationally efficient method especially when the number of data points is large. The information criteria of Bozdogan together with others are used to decide the number of eigenvalues.
Original language | English (US) |
---|---|
Title of host publication | Statistical Data Mining and Knowledge Discovery |
Publisher | CRC Press |
Pages | 309-322 |
Number of pages | 14 |
ISBN (Electronic) | 9780203497159 |
ISBN (Print) | 9781584883449 |
State | Published - Jan 1 2003 |
All Science Journal Classification (ASJC) codes
- General Computer Science
- General Economics, Econometrics and Finance
- General Business, Management and Accounting