Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models.

Original languageEnglish (US)
Article numbere0302947
JournalPloS one
Volume19
Issue number5 May
DOIs
StatePublished - May 2024

All Science Journal Classification (ASJC) codes

  • General

Fingerprint

Dive into the research topics of 'Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data'. Together they form a unique fingerprint.

Cite this