TY - JOUR
T1 - kmerDB
T2 - A database encompassing the set of genomic and proteomic sequence information for each species
AU - Mouratidis, Ioannis
AU - Baltoumas, Fotis A.
AU - Chantzi, Nikol
AU - Patsakis, Michail
AU - Chan, Candace S.Y.
AU - Montgomery, Austin
AU - Konnaris, Maxwell A.
AU - Aplakidou, Eleni
AU - Georgakopoulos, George C.
AU - Das, Anshuman
AU - Chartoumpekis, Dionysios V.
AU - Kovac, Jasna
AU - Pavlopoulos, Georgios A.
AU - Georgakopoulos-Soares, Ilias
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/12
Y1 - 2024/12
N2 - The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.
AB - The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.
UR - http://www.scopus.com/inward/record.url?scp=85192004516&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85192004516&partnerID=8YFLogxK
U2 - 10.1016/j.csbj.2024.04.050
DO - 10.1016/j.csbj.2024.04.050
M3 - Article
C2 - 38711760
AN - SCOPUS:85192004516
SN - 2001-0370
VL - 23
SP - 1919
EP - 1928
JO - Computational and Structural Biotechnology Journal
JF - Computational and Structural Biotechnology Journal
ER -