Machine learning approaches to analyze histological images of tissues from radical prostatectomies

Arkadiusz Gertych, Nathan Ing, Zhaoxuan Ma, Thomas J. Fuchs, Sadri Salman, Sambit Mohanty, Sanica Bhele, Adriana Velásquez-Vacca, Mahul B. Amin, Beatrice S. Knudsen

Research output: Contribution to journalArticlepeer-review

83 Scopus citations


Computerized evaluation of histological preparations of prostate tissues involves identification of tissue components such as stroma (ST), benign/normal epithelium (BN) and prostate cancer (PCa). Image classification approaches have been developed to identify and classify glandular regions in digital images of prostate tissues; however their success has been limited by difficulties in cellular segmentation and tissue heterogeneity. We hypothesized that utilizing image pixels to generate intensity histograms of hematoxylin (H) and eosin (E) stains deconvoluted from H&E images numerically captures the architectural difference between glands and stroma. In addition, we postulated that joint histograms of local binary patterns and local variance (LBPxVAR) can be used as sensitive textural features to differentiate benign/normal tissue from cancer. Here we utilized a machine learning approach comprising of a support vector machine (SVM) followed by a random forest (RF) classifier to digitally stratify prostate tissue into ST, BN and PCa areas. Two pathologists manually annotated 210 images of low- and high-grade tumors from slides that were selected from 20 radical prostatectomies and digitized at high-resolution. The 210 images were split into the training (n=19) and test (n=191) sets. Local intensity histograms of H and E were used to train a SVM classifier to separate ST from epithelium (BN+PCa). The performance of SVM prediction was evaluated by measuring the accuracy of delineating epithelial areas. The Jaccard J=59.5±14.6 and RandRi=62.0±7.5 indices reported a significantly better prediction when compared to a reference method (Chen et al., Clinical Proteomics 2013, 10:18) based on the averaged values from the test set. To distinguish BN from PCa we trained a RF classifier with LBPxVAR and local intensity histograms and obtained separate performance values for BN and PCa: JBN=35.2±24.9, OBN=49.6±32, JPCa=49.5±18.5, OPCa=72.7±14.8 andRi=60.6±7.6 in the test set. Our pixel-based classification does not rely on the detection of lumens, which is prone to errors and has limitations in high-grade cancers and has the potential to aid in clinical studies in which the quantification of tumor content is necessary to prognosticate the course of the disease. The image data set with ground truth annotation is available for public use to stimulate further research in this area.

Original languageEnglish (US)
Pages (from-to)197-208
Number of pages12
JournalComputerized Medical Imaging and Graphics
StatePublished - Dec 1 2015

All Science Journal Classification (ASJC) codes

  • Radiological and Ultrasound Technology
  • Radiology Nuclear Medicine and imaging
  • Computer Vision and Pattern Recognition
  • Health Informatics
  • Computer Graphics and Computer-Aided Design

Cite this