TY - GEN
T1 - Random Forest Modeling for Survival Analysis of Cancer Recurrences
AU - Imani, Farhad
AU - Chen, Ruimin
AU - Tucker, Conrad
AU - Yang, Hui
N1 - Funding Information:
ACKNOWLEDGMENT This research is funded in part by the NSF I/UCRC Center for Healthcare Organization Transformation (CHOT) award 1624727 and in part by Susan G. Komen Foundation. The authors gratefully acknowledge the valuable contributions and suggestions from Dr. Jerome Jourquin and Dr. Stephanie Reffey for this research study. Any opinions, findings, or conclusions found in this paper are those of the authors and do not necessarily reflect the views of the sponsors.
Funding Information:
The SEER program, supported by the U.S. National Cancer Institute (NCI), collects data from tumor registries and covers 14% to 25% of the U.S. population. The breast cancer data include more than one and half million records from different regions of the U.S. from year 1973 to 2015. These registries’ data are vital to analyzing and reporting the evolving burden of breast cancer in the population.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - The recurrence of breast cancer is a prevailing problem that decreases the quality of patients' lives, creates high burdens on the healthcare system, and impacts the wellbeing of society. Advanced sensing provides an unprecedented opportunity to increase information visibility and characterize patterns of event occurrences. However, few, if any, of previous works have investigated survival analysis of breast cancer recurrences based on large amount of data readily available in the health system. There is a dire need to leverage data to decipher important factors that play a role in the recurrence of breast cancer. This paper presents an ensemble method of random survival forest for time-to-event analysis of breast cancer recurrences in the surveillance, epidemiology, and end results (SEER) data from year 1973 to 2015. Our model characterizes the survival function among patients with and without recurrences of breast cancer. Ensemble models are constructed via sampling and bootstrapping into the big data. Experimental results show that the age when cancer recurrence happens and time-between-recurrences approximately follow the Gaussian and exponential distributions with the means of 61.35 \pm 14.03 and 2.61 years, respectively. In addition, the results show age, surgery status, stage of tumors, and histological grade are significant factors that influence the probability of breast cancer recurrences. The proposed survival analysis approach shows strong potentials to help healthcare practitioners in prognosis, treatment, and decision-making of breast cancer recurrences.
AB - The recurrence of breast cancer is a prevailing problem that decreases the quality of patients' lives, creates high burdens on the healthcare system, and impacts the wellbeing of society. Advanced sensing provides an unprecedented opportunity to increase information visibility and characterize patterns of event occurrences. However, few, if any, of previous works have investigated survival analysis of breast cancer recurrences based on large amount of data readily available in the health system. There is a dire need to leverage data to decipher important factors that play a role in the recurrence of breast cancer. This paper presents an ensemble method of random survival forest for time-to-event analysis of breast cancer recurrences in the surveillance, epidemiology, and end results (SEER) data from year 1973 to 2015. Our model characterizes the survival function among patients with and without recurrences of breast cancer. Ensemble models are constructed via sampling and bootstrapping into the big data. Experimental results show that the age when cancer recurrence happens and time-between-recurrences approximately follow the Gaussian and exponential distributions with the means of 61.35 \pm 14.03 and 2.61 years, respectively. In addition, the results show age, surgery status, stage of tumors, and histological grade are significant factors that influence the probability of breast cancer recurrences. The proposed survival analysis approach shows strong potentials to help healthcare practitioners in prognosis, treatment, and decision-making of breast cancer recurrences.
UR - http://www.scopus.com/inward/record.url?scp=85072990514&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072990514&partnerID=8YFLogxK
U2 - 10.1109/COASE.2019.8843271
DO - 10.1109/COASE.2019.8843271
M3 - Conference contribution
AN - SCOPUS:85072990514
T3 - IEEE International Conference on Automation Science and Engineering
SP - 399
EP - 404
BT - 2019 IEEE 15th International Conference on Automation Science and Engineering, CASE 2019
PB - IEEE Computer Society
T2 - 15th IEEE International Conference on Automation Science and Engineering, CASE 2019
Y2 - 22 August 2019 through 26 August 2019
ER -