Classification Models and Survival Analysis for Prostate Cancer Using RNA Sequencing and Clinical Data

Md Faisal Kabir, Simone A. Ludwig

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

Early detection of cancer can significantly increase the chance of successful treatment. This research performs a study on early cancer detection for prostate cancer patients from whom cancer tissue was analyzed with Illumina Hi-Seq ribonucleic acid (RNA) Sequencing (RNA-Seq). Cancer relevant genes with the most significant correlations with the clinical outcome of the sample type (cancer /non-cancer) and the overall survival (OS) were assessed. Traditional cancer diagnosis primarily depends on physicians' experience to identify morphological abnormalities. Gene expression level data can assist physicians in detecting cancer cases at a much earlier stage and thus can significantly improve the potential of patient treatment. In this research, for the classification task, we applied machine learning and data mining approaches to detect cancer versus non-cancer based on gene expression data. Our goal was to detect cancer at the earliest stage. Besides, for the regression task, survival outcomes in prostate cancer patients were performed. Regression trees were built using cancer-sensitive genes along with clinical attribute 'Gleason score' as predictors, and the clinical variable 'overall survival' as the target variable. Knowledge in the form of rules is one of the vital tasks in data mining as it provides concise statements of easily understandable and potentially valuable information. For the classification model, we derived rules from a decision tree and interpreted these rules for cancer and non-cancer patients. For the regression or survival model, we generated rules for predicting or estimating the survival time of cancer patients. In this study, cancer-relevant genes were analyzed as predictors, although various genes may interact with genes currently known to contribute to cancer. These findings have implications for assessing gene-gene interactions and gene-environment interactions of prostate cancer as well as for other types of cancer.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2736-2745
Number of pages10
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: Dec 9 2019Dec 12 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
Country/TerritoryUnited States
CityLos Angeles
Period12/9/1912/12/19

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Classification Models and Survival Analysis for Prostate Cancer Using RNA Sequencing and Clinical Data'. Together they form a unique fingerprint.

Cite this