Searching sequences in large, distributed databases is the most widely used bioinformatics analysis done. This basic task is in dire need for solutions that deal with the exponential growth of sequence repositories and perform approximate queries very fast. In this paper, we present a novel data structure: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it has the potential to serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size and search time while achieving a comparable or better accuracy compared to other state-of-the art tools (Mantis and Bifrost). The HIBF builds an index up to 211 times faster, using up t...
International audienceThis paper presents a seed-based algorithm for intensive DNA sequence comparis...
Bloom filters are widely used in genome assembly, IoT applications and several network applications ...
A Bloom Filter is an efficient randomized data structure for membership queries on a set with a cert...
We present Raptor, a system for approximately searching many queries such as next-generation sequenc...
In the biological sciences, sequence analysis refers to analytical investigations that use nucleic a...
Motivation The ever-growing size of sequencing data is a major bottleneck in bioinformatics as th...
Storing and processing of large DNA sequences has always been a major problem due to increasing volu...
In this paper, we present two novel hash-based indexing structures, based on Bloom filters, called B...
International audienceWhen indexing large collections of short-read sequencing data, a common operat...
Bytewise approximate matching algorithms have in recent years shown significant promise in detecting...
International audienceThe ubiquity of next generation sequencing has transformed the size and nature...
International audienceWith High Throughput Sequencing (HTS) technologies, biology is experiencing a ...
International audience. Genomic and metagenomic fields, generating huge sets ofshort genomic sequenc...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
MotivationDetection of maximal exact matches (MEMs) between two long sequences is a fundamental prob...
International audienceThis paper presents a seed-based algorithm for intensive DNA sequence comparis...
Bloom filters are widely used in genome assembly, IoT applications and several network applications ...
A Bloom Filter is an efficient randomized data structure for membership queries on a set with a cert...
We present Raptor, a system for approximately searching many queries such as next-generation sequenc...
In the biological sciences, sequence analysis refers to analytical investigations that use nucleic a...
Motivation The ever-growing size of sequencing data is a major bottleneck in bioinformatics as th...
Storing and processing of large DNA sequences has always been a major problem due to increasing volu...
In this paper, we present two novel hash-based indexing structures, based on Bloom filters, called B...
International audienceWhen indexing large collections of short-read sequencing data, a common operat...
Bytewise approximate matching algorithms have in recent years shown significant promise in detecting...
International audienceThe ubiquity of next generation sequencing has transformed the size and nature...
International audienceWith High Throughput Sequencing (HTS) technologies, biology is experiencing a ...
International audience. Genomic and metagenomic fields, generating huge sets ofshort genomic sequenc...
Sequence data is one of the rapidly growing types of data. New efficient and scalable techniques are...
MotivationDetection of maximal exact matches (MEMs) between two long sequences is a fundamental prob...
International audienceThis paper presents a seed-based algorithm for intensive DNA sequence comparis...
Bloom filters are widely used in genome assembly, IoT applications and several network applications ...
A Bloom Filter is an efficient randomized data structure for membership queries on a set with a cert...