Summary: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics ap-plications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count. We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned, and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts...
The emergence of Next Generation Sequencing (NGS) platforms has increased the throughput of genomic ...
Current common storage media has limited ability to store data with present data explosion trends, w...
The growing volume of generated DNA sequencing data makes the problem of its long-term storage incre...
Summary: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the prelimi...
Summary: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the prelimi...
Motivation: Building the histogram of occurrences of every k-symbol long substring of nucleotide dat...
Motivation: A major challenge in next-generation genome seque-ncing (NGS) is to assemble massive ove...
Motivation: Building the histogram of occurrences of every k-symbol long substring of nucleotide dat...
Motivation: Counting the frequencies of k-mers in read libraries is often a first step in the analys...
Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-t...
Background: K-mer frequency counting is an upstream process of many bioinformatics data analysis wor...
Abstract Genomics data analysis requires efficient tools to address the vast amount of data generate...
Motivation: Several applications in bioinformatics, such as genome assemblers and error corrections ...
We propose a lightweight data structure for indexing and querying collections of NGS reads data in m...
This is a talk given in the context of the BSC Life Sessions Abstract k-mers are used on a daily b...
The emergence of Next Generation Sequencing (NGS) platforms has increased the throughput of genomic ...
Current common storage media has limited ability to store data with present data explosion trends, w...
The growing volume of generated DNA sequencing data makes the problem of its long-term storage incre...
Summary: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the prelimi...
Summary: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the prelimi...
Motivation: Building the histogram of occurrences of every k-symbol long substring of nucleotide dat...
Motivation: A major challenge in next-generation genome seque-ncing (NGS) is to assemble massive ove...
Motivation: Building the histogram of occurrences of every k-symbol long substring of nucleotide dat...
Motivation: Counting the frequencies of k-mers in read libraries is often a first step in the analys...
Counting the frequencies of k-mers in read libraries is often a first step in the analysis of high-t...
Background: K-mer frequency counting is an upstream process of many bioinformatics data analysis wor...
Abstract Genomics data analysis requires efficient tools to address the vast amount of data generate...
Motivation: Several applications in bioinformatics, such as genome assemblers and error corrections ...
We propose a lightweight data structure for indexing and querying collections of NGS reads data in m...
This is a talk given in the context of the BSC Life Sessions Abstract k-mers are used on a daily b...
The emergence of Next Generation Sequencing (NGS) platforms has increased the throughput of genomic ...
Current common storage media has limited ability to store data with present data explosion trends, w...
The growing volume of generated DNA sequencing data makes the problem of its long-term storage incre...