We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DRE...
International audienceBackground High Throughput Sequencing (HTS) is now heavily exploited for genom...
Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analy...
DNA sequence comparison and database search have evolved in the last years as a field of strong comp...
Summary: We present Raptor, a system for approximately searching many queries such as next-generatio...
Searching sequences in large, distributed databases is the most widely used bioinformatics analysis ...
In the biological sciences, sequence analysis refers to analytical investigations that use nucleic a...
Motivation: Comparison of nucleic acid and protein sequences is a fundamental tool of modern bioinfo...
Storing and processing of large DNA sequences has always been a major problem due to increasing volu...
Motivation The ever-growing size of sequencing data is a major bottleneck in bioinformatics as th...
The growing volume of generated DNA sequencing data makes the problem of its long-term storage incre...
Motivation: Counting the frequencies of k-mers in read libraries is often a first step in the analys...
Biology researchers have a pressing need for data management technologies which will make the storag...
Next Generation Sequencing machines are generating mil-lions of short DNA sequences (reads) everyday...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
Summarization: DNA sequence comparison and database search have evolved in the last years as a field...
International audienceBackground High Throughput Sequencing (HTS) is now heavily exploited for genom...
Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analy...
DNA sequence comparison and database search have evolved in the last years as a field of strong comp...
Summary: We present Raptor, a system for approximately searching many queries such as next-generatio...
Searching sequences in large, distributed databases is the most widely used bioinformatics analysis ...
In the biological sciences, sequence analysis refers to analytical investigations that use nucleic a...
Motivation: Comparison of nucleic acid and protein sequences is a fundamental tool of modern bioinfo...
Storing and processing of large DNA sequences has always been a major problem due to increasing volu...
Motivation The ever-growing size of sequencing data is a major bottleneck in bioinformatics as th...
The growing volume of generated DNA sequencing data makes the problem of its long-term storage incre...
Motivation: Counting the frequencies of k-mers in read libraries is often a first step in the analys...
Biology researchers have a pressing need for data management technologies which will make the storag...
Next Generation Sequencing machines are generating mil-lions of short DNA sequences (reads) everyday...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
Summarization: DNA sequence comparison and database search have evolved in the last years as a field...
International audienceBackground High Throughput Sequencing (HTS) is now heavily exploited for genom...
Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analy...
DNA sequence comparison and database search have evolved in the last years as a field of strong comp...