Supervised classification of genomic sequences is a challenging, well-studied problem with a variety of important applications. We propose an open-source, supervised, alignment-free, highly general method for sequence classification that operates on k-mer proportions of DNA sequences. This method was implemented in a fully standalone general-purpose software package called Kameris, publicly available under a permissive open-source license. Compared to competing software, ours provides key advantages in terms of data security and privacy, transparency, and reproducibility. We perform a detailed study of its accuracy and performance on a wide variety of classification tasks, including virus subtyping, taxonomic classification, and human haplo...
As cost and throughput of second-generation sequencers continue to improve, even modestly resourced ...
In recent years, the decreasing cost of ‘Next generation’ sequencing has spawned numerous applicatio...
As of October 2020, there are 18.6 × 1015 DNA base pairs publicly available in the Sequence Read Arc...
An organism’s DNA sequence is a virtual cornucopia of information, and sequencing technology is the ...
In the field of bioinformatics, taxonomic classification is the scientific practice of identifying, ...
The classification of DNA sequences is a key research area in bioinformatics as it enables researche...
High-throughput DNA sequencing has revolutionised microbiology and is the foundation on which the na...
Abstract The study proposes a novel model for DNA sequence classification that combines machine lear...
Biological sequence datasets are increasing at a prodigious rate. The volume of data in these datase...
The perpetually increasing rate at which viral full-genome sequences are being determined is creatin...
Through the study of genomic sequences, researchers are able to learn much about the workings of lif...
For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes wi...
Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all are...
Abstract Background Although software tools abound for the comparison, analysis, identification, and...
Background: A fundamental problem in modern genomics is to taxonomically or functionally classify DN...
As cost and throughput of second-generation sequencers continue to improve, even modestly resourced ...
In recent years, the decreasing cost of ‘Next generation’ sequencing has spawned numerous applicatio...
As of October 2020, there are 18.6 × 1015 DNA base pairs publicly available in the Sequence Read Arc...
An organism’s DNA sequence is a virtual cornucopia of information, and sequencing technology is the ...
In the field of bioinformatics, taxonomic classification is the scientific practice of identifying, ...
The classification of DNA sequences is a key research area in bioinformatics as it enables researche...
High-throughput DNA sequencing has revolutionised microbiology and is the foundation on which the na...
Abstract The study proposes a novel model for DNA sequence classification that combines machine lear...
Biological sequence datasets are increasing at a prodigious rate. The volume of data in these datase...
The perpetually increasing rate at which viral full-genome sequences are being determined is creatin...
Through the study of genomic sequences, researchers are able to learn much about the workings of lif...
For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes wi...
Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all are...
Abstract Background Although software tools abound for the comparison, analysis, identification, and...
Background: A fundamental problem in modern genomics is to taxonomically or functionally classify DN...
As cost and throughput of second-generation sequencers continue to improve, even modestly resourced ...
In recent years, the decreasing cost of ‘Next generation’ sequencing has spawned numerous applicatio...
As of October 2020, there are 18.6 × 1015 DNA base pairs publicly available in the Sequence Read Arc...