Abstract Background Detection of highly divergent or yet unknown viruses from metagenomics sequencing datasets is a major bioinformatics challenge. When human samples are sequenced, a large proportion of assembled contigs are classified as “unknown”, as conventional methods find no similarity to known sequences. We wished to explore whether machine learning algorithms using Relative Synonymous Codon Usage frequency (RSCU) could improve the detection of viral sequences in metagenomic sequencing data. Results We trained Random Forest and Artificial Neural Network using metagenomic sequences taxonomically classified into virus and non-virus classes. The algorithms achieved accuracies well beyond chance level, with area under ROC curve 0.79. Tw...
Tools allowing for the identification of viral sequences in host-associated and environmental metage...
Collectively, viruses have the greatest genetic diversity on Earth, occupy extremely varied niches a...
Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in meta...
More than 2 million cancer cases around the world each year are caused by viruses. In addition, ther...
Estimating the taxonomic composition of viral sequences in a biological samples processed by next-ge...
Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major cha...
The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has...
The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has...
Abstract Background Identifying viral sequences in mixed metagenomes containing both viral and host ...
High-throughput sequencing technologies have greatly enabled the study of genomics, transcriptomics ...
International audienceShotgun sequencing of environmental DNA (i.e., metagenomics) has revolutionize...
The world is grappling with the COVID-19 pandemic caused by the 2019 novel SARS-CoV-2. To better und...
The use of machine learning within the field of metagenomic classification is becoming more relevant...
The analysis of large microbiome data sets holds great promise for the delineation of the biological...
Next generation, high throughput sequencing has revolutionised the way in which we are able to view ...
Tools allowing for the identification of viral sequences in host-associated and environmental metage...
Collectively, viruses have the greatest genetic diversity on Earth, occupy extremely varied niches a...
Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in meta...
More than 2 million cancer cases around the world each year are caused by viruses. In addition, ther...
Estimating the taxonomic composition of viral sequences in a biological samples processed by next-ge...
Despite its clinical importance, detection of highly divergent or yet unknown viruses is a major cha...
The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has...
The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has...
Abstract Background Identifying viral sequences in mixed metagenomes containing both viral and host ...
High-throughput sequencing technologies have greatly enabled the study of genomics, transcriptomics ...
International audienceShotgun sequencing of environmental DNA (i.e., metagenomics) has revolutionize...
The world is grappling with the COVID-19 pandemic caused by the 2019 novel SARS-CoV-2. To better und...
The use of machine learning within the field of metagenomic classification is becoming more relevant...
The analysis of large microbiome data sets holds great promise for the delineation of the biological...
Next generation, high throughput sequencing has revolutionised the way in which we are able to view ...
Tools allowing for the identification of viral sequences in host-associated and environmental metage...
Collectively, viruses have the greatest genetic diversity on Earth, occupy extremely varied niches a...
Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in meta...