Naïve Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The Pseudomonas syringae species complex (PSSC) is a diverse group of plant pathogens with a collective host range encompassing almost every food crop grown today. As a threat to global food security, rapid detection and characterization of epidemic and emerging pathogenic lineages is essential. However, phylogenetic identification is often complicated by an unclarified and ever-changing taxonomy, making practical use of available databases and the proper training of classifiers difficult. As such, while amplicon sequencing is a common method for routine identification of PSSC isolates, there is no efficient method for accurate classification based on this data. Here we present a suite of five Naïve bayes classifiers for PCR primer sets widely used for PSSC identification, trained on in-silico amplicon data from 2,161 published PSSC genomes using the life identification number (LIN) hierarchical clustering algorithm in place of traditional Linnaean taxonomy. Additionally, we include a dataset for translating classification results back into traditional taxonomic nomenclature (i.e. species, phylogroup, pathovar), and for predicting virulence factor repertoires.

Original languageEnglish (US)
Article number178
JournalScientific Data
Volume11
Issue number1
DOIs
StatePublished - Dec 2024

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Information Systems
  • Education
  • Computer Science Applications
  • Statistics, Probability and Uncertainty
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Naïve Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization'. Together they form a unique fingerprint.

Cite this