TY - JOUR
T1 - A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction
AU - Kabir, Md Faisal
AU - Chen, Tianjie
AU - Ludwig, Simone A.
N1 - Publisher Copyright:
© 2022 The Author(s)
PY - 2023/11
Y1 - 2023/11
N2 - Developments in technology facilitate the use of machine learning methods in medical fields. In cancer research, the combination of machine learning tools and gene expression data has proven its ability to detect cancer patients. However, processing such high-dimensional and complex data is still a challenge. This paper analyzed the impact different dimensionality reduction techniques have on machine learning models used for cancer prediction. Dimensionality reduction techniques such as principal component analysis (PCA), PCA with a kernel, and autoencoder were utilized to reduce the dimensionality of the RNA sequencing data. Two machine learning classifiers, namely neural network and support vector machine, were trained and tested using the original, dimensionally reduced, and cancer-relevant data. Various metrics, such as accuracy, precision, recall, F-Measure, receiver operating characteristic curve, and area under the curve, were used to assess the performance of classifiers. The results showed that dimensionality reduction positively affects the performance of the classifiers. Additionally, autoencoder performed better than PCA and PCA with a kernal. These findings indicate the potential of dimensionality reduction in improving the analytical results of machine learning classification models on high-dimensional data.
AB - Developments in technology facilitate the use of machine learning methods in medical fields. In cancer research, the combination of machine learning tools and gene expression data has proven its ability to detect cancer patients. However, processing such high-dimensional and complex data is still a challenge. This paper analyzed the impact different dimensionality reduction techniques have on machine learning models used for cancer prediction. Dimensionality reduction techniques such as principal component analysis (PCA), PCA with a kernel, and autoencoder were utilized to reduce the dimensionality of the RNA sequencing data. Two machine learning classifiers, namely neural network and support vector machine, were trained and tested using the original, dimensionally reduced, and cancer-relevant data. Various metrics, such as accuracy, precision, recall, F-Measure, receiver operating characteristic curve, and area under the curve, were used to assess the performance of classifiers. The results showed that dimensionality reduction positively affects the performance of the classifiers. Additionally, autoencoder performed better than PCA and PCA with a kernal. These findings indicate the potential of dimensionality reduction in improving the analytical results of machine learning classification models on high-dimensional data.
UR - http://www.scopus.com/inward/record.url?scp=85148470428&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85148470428&partnerID=8YFLogxK
U2 - 10.1016/j.health.2022.100125
DO - 10.1016/j.health.2022.100125
M3 - Article
AN - SCOPUS:85148470428
SN - 2772-4425
VL - 3
JO - Healthcare Analytics
JF - Healthcare Analytics
M1 - 100125
ER -