A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction

Md Faisal Kabir, Tianjie Chen, Simone A. Ludwig

Research output: Contribution to journalArticlepeer-review

20 Scopus citations


Developments in technology facilitate the use of machine learning methods in medical fields. In cancer research, the combination of machine learning tools and gene expression data has proven its ability to detect cancer patients. However, processing such high-dimensional and complex data is still a challenge. This paper analyzed the impact different dimensionality reduction techniques have on machine learning models used for cancer prediction. Dimensionality reduction techniques such as principal component analysis (PCA), PCA with a kernel, and autoencoder were utilized to reduce the dimensionality of the RNA sequencing data. Two machine learning classifiers, namely neural network and support vector machine, were trained and tested using the original, dimensionally reduced, and cancer-relevant data. Various metrics, such as accuracy, precision, recall, F-Measure, receiver operating characteristic curve, and area under the curve, were used to assess the performance of classifiers. The results showed that dimensionality reduction positively affects the performance of the classifiers. Additionally, autoencoder performed better than PCA and PCA with a kernal. These findings indicate the potential of dimensionality reduction in improving the analytical results of machine learning classification models on high-dimensional data.

Original languageEnglish (US)
Article number100125
JournalHealthcare Analytics
StatePublished - Nov 2023

All Science Journal Classification (ASJC) codes

  • Analytical Chemistry
  • Health Informatics

Cite this