## Predicting pathogenic potentials from NGS reads: novel bacterial species This repository contains simulated Illumina read datasets for bacterial pathogenic potential prediction and associated metadata extracted from the IMG Database (https://img.jgi.doe.gov/). The reads are 250bp long and were simulated with Mason (https://www.seqan.de/apps/mason/) from genomes downloaded from NCBI. The training-validation-test split was done on the species level to ensure "novelty" of validation and test species. The training sets contain 10 million reads per class, validation sets - 1.25 million reads per class, and test sets - 1.25 million paired reads per class. Additional, imbalanced training sets contain 2.5 million "nonpathogenic" and 17.5 millio...
Simulated Illumina reads for SNP distance method evaluation and comparison used in the article "Comp...
The sequencing of the human genome has opened up completely new avenues in research and the notion o...
al lu making genome sequences of pathogens of clinical or times shorten, expansion of NGS into diagn...
This repository contains simulated Illumina read datasets for novel fungal pathogen prediction and r...
This repository contains simulated Illumina read datasets for novel human virus prediction and assoc...
Although the majority of bacteria are innocuous or even beneficial for their host, others are highly...
The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key ch...
Genomic islands (GIs), including pathogenicity islands, are commonly defined as clusters of genes in...
We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy v...
This training dataset is from an imaginary Staphylococcus aureus bacterium with a miniature genome. ...
Technological advancements have led to an exponential increase in omics data generation. This data p...
Massively parallel sequencing of microbial genetic markers (MGMs) is used to uncover the species com...
The phylogenetic profile of a gene is a reflection of its evolutionary history and can be defined as...
<div><p>Although the majority of bacteria are harmless or even beneficial to their host, others are ...
<div><p>Although there have been great advances in understanding bacterial pathogenesis, there is st...
Simulated Illumina reads for SNP distance method evaluation and comparison used in the article "Comp...
The sequencing of the human genome has opened up completely new avenues in research and the notion o...
al lu making genome sequences of pathogens of clinical or times shorten, expansion of NGS into diagn...
This repository contains simulated Illumina read datasets for novel fungal pathogen prediction and r...
This repository contains simulated Illumina read datasets for novel human virus prediction and assoc...
Although the majority of bacteria are innocuous or even beneficial for their host, others are highly...
The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key ch...
Genomic islands (GIs), including pathogenicity islands, are commonly defined as clusters of genes in...
We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy v...
This training dataset is from an imaginary Staphylococcus aureus bacterium with a miniature genome. ...
Technological advancements have led to an exponential increase in omics data generation. This data p...
Massively parallel sequencing of microbial genetic markers (MGMs) is used to uncover the species com...
The phylogenetic profile of a gene is a reflection of its evolutionary history and can be defined as...
<div><p>Although the majority of bacteria are harmless or even beneficial to their host, others are ...
<div><p>Although there have been great advances in understanding bacterial pathogenesis, there is st...
Simulated Illumina reads for SNP distance method evaluation and comparison used in the article "Comp...
The sequencing of the human genome has opened up completely new avenues in research and the notion o...
al lu making genome sequences of pathogens of clinical or times shorten, expansion of NGS into diagn...