Recent advances in sequencing technology allow to produce billions of base pairs per day in the form of reads of length 100 bp an longer and current developments promise the personal $1,000 genome in a couple of years. The analysis of these unprecedented amounts of data demands for efficient data structures and algorithms. One such data structures is the substring index, that represents all substrings or substrings up to a certain length contained in a given text. In this thesis we propose 3 substring indices, which we extend to be applicable to millions of sequences. We devise internal and external memory construction algorithms and a uniform framework for accessing the generalized suffix tree. Additionally we propose different index-base...
International audienceGenomic and metagenomic fields, generating huge sets of short genomic sequence...
International audience. Genomic and metagenomic fields, generating huge sets ofshort genomic sequenc...
The growing volume of generated DNA sequencing data makes the problem of its long-term storage incre...
Recent advances in sequencing technology allow to produce billions of base pairs per day in the form...
This thesis addresses important algorithms and data structures used in sequence analysis for applica...
During the last years, sequencing throughput increased dramatically with the introduc-tion of so-cal...
International audienceWith High Throughput Sequencing (HTS) technologies, biology is experiencing a ...
The combination of incessant advances in sequencing technology producing large amounts of data and i...
High-throughput sequencing has helped to transform our study of biological organisms and processes. ...
The work presented in this dissertation deals with establishing efficient methods for solving some a...
A number of technological advancements in high-throughput genome sequencing have led to the generat...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
Over the past years, high-throughput sequencing (HTS) has become an invaluable method of investigati...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
In this article, we propose a novel pattern matching algorithm, called BAPM, that performs searching...
International audienceGenomic and metagenomic fields, generating huge sets of short genomic sequence...
International audience. Genomic and metagenomic fields, generating huge sets ofshort genomic sequenc...
The growing volume of generated DNA sequencing data makes the problem of its long-term storage incre...
Recent advances in sequencing technology allow to produce billions of base pairs per day in the form...
This thesis addresses important algorithms and data structures used in sequence analysis for applica...
During the last years, sequencing throughput increased dramatically with the introduc-tion of so-cal...
International audienceWith High Throughput Sequencing (HTS) technologies, biology is experiencing a ...
The combination of incessant advances in sequencing technology producing large amounts of data and i...
High-throughput sequencing has helped to transform our study of biological organisms and processes. ...
The work presented in this dissertation deals with establishing efficient methods for solving some a...
A number of technological advancements in high-throughput genome sequencing have led to the generat...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
Over the past years, high-throughput sequencing (HTS) has become an invaluable method of investigati...
Searching for matches between large collections of short (14-30 nucleotides) words and sequence data...
In this article, we propose a novel pattern matching algorithm, called BAPM, that performs searching...
International audienceGenomic and metagenomic fields, generating huge sets of short genomic sequence...
International audience. Genomic and metagenomic fields, generating huge sets ofshort genomic sequenc...
The growing volume of generated DNA sequencing data makes the problem of its long-term storage incre...