In the biological sciences, sequence analysis refers to analytical investigations that use nucleic acid or protein sequences to elucidate biological insights from them, such as their function, species of origin, or evolutionary relationships. However, sequences are not very meaningful by themselves, and useful insights generally come from comparing them to other sequences. Indexing sequences using concepts borrowed from the computational sciences may help perform these comparisons. One such concept is a probabilistic data structure, the Bloom filter, which enables low memory indexing with high computational efficiency at the cost of false-positive queries by storing a signature of a sequence rather than the sequence itself. This thesis expl...
Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analy...
Current common storage media has limited ability to store data with present data explosion trends, w...
This thesis investigates the impact of sequencing errors in post-sequence computational analyses, in...
In the biological sciences, sequence analysis refers to analytical investigations that use nucleic a...
The simple world of algorithms can be applied to various problems all around us. With significant gr...
Biology researchers have a pressing need for data management technologies which will make the storag...
International audienceWhen indexing large collections of short-read sequencing data, a common operat...
Storing and processing of large DNA sequences has always been a major problem due to increasing volu...
With growing throughput and dropping cost of High-Throughput Sequencing (HTS) technologies, there is...
Characterizing the functional, structural, and evolutionary relationships of biological sequences is...
The study of biological and genetic information, mostly DNA data, is an extremely important subject ...
Biology researchers have a pressing need for data management technologies which will make the storag...
Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed th...
International audienceLarge corpura of texts or of sequences serve as references and are interrogate...
The article describes two new clustering algorithms for DNA nucleotide sequences, summarizes the res...
Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analy...
Current common storage media has limited ability to store data with present data explosion trends, w...
This thesis investigates the impact of sequencing errors in post-sequence computational analyses, in...
In the biological sciences, sequence analysis refers to analytical investigations that use nucleic a...
The simple world of algorithms can be applied to various problems all around us. With significant gr...
Biology researchers have a pressing need for data management technologies which will make the storag...
International audienceWhen indexing large collections of short-read sequencing data, a common operat...
Storing and processing of large DNA sequences has always been a major problem due to increasing volu...
With growing throughput and dropping cost of High-Throughput Sequencing (HTS) technologies, there is...
Characterizing the functional, structural, and evolutionary relationships of biological sequences is...
The study of biological and genetic information, mostly DNA data, is an extremely important subject ...
Biology researchers have a pressing need for data management technologies which will make the storag...
Motivation: Recent experimental studies on compressed indexes (BWT, CSA, FM-index) have confirmed th...
International audienceLarge corpura of texts or of sequences serve as references and are interrogate...
The article describes two new clustering algorithms for DNA nucleotide sequences, summarizes the res...
Motivation: High-Throughput Next-Generation-Sequencing can generate huge sequence files, whose analy...
Current common storage media has limited ability to store data with present data explosion trends, w...
This thesis investigates the impact of sequencing errors in post-sequence computational analyses, in...