The scope and scale of biological data continues to grow at an exponential clip, driven by advances in genetic sequencing, annotation and widespread adoption of surveillance efforts. For instance, the Sequence Read Archive (SRA) now contains more than 25 petabases of public data, while RefSeq, a collection of reference genomes, recently surpassed 100,000 complete genomes. In the process, it has outgrown the practical reach of many traditional algorithmic approaches in both time and space. Motivated by this extreme scale, this thesis details efficient methods for clustering and summarizing large collections of sequence data. While our primary area of interest is biological sequences, these approaches largely apply to sequence collections of ...
[EN]The advent of big data and advanced genomic sequencing technologies has presented challenges in ...
AbstractMotivationInformation theoretic and compositional/linguistic analysis of genomes have a cent...
With the development of high-throughput and low-cost genotyping technologies, immense data can be ch...
As cost and throughput of second-generation sequencers continue to improve, even modestly resourced ...
Background: Distributed approaches based on the MapReduce programming paradigm have started to be pr...
Distributed approaches based on the MapReduce programming paradigm have started to be proposed in th...
The rise of next-generation sequencing has produced an abundance of data with almost limitless analy...
A revolution in personalized genomics will occur when scientists can sequence genomes of millions of...
An organism’s DNA sequence is a virtual cornucopia of information, and sequencing technology is the ...
Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all are...
Bioinformatic analyses generally involve passing genetic sequence data through a pipeline of transfo...
The dramatic progress in DNA sequencing technology over the last decade, with the revolutionary int...
The explosive growth in biological sequence data coupled with the design and deployment of increasin...
International audienceWith High Throughput Sequencing (HTS) technologies, biology is experiencing a ...
We present research on the design, development and application of algorithms for DNA sequence analys...
[EN]The advent of big data and advanced genomic sequencing technologies has presented challenges in ...
AbstractMotivationInformation theoretic and compositional/linguistic analysis of genomes have a cent...
With the development of high-throughput and low-cost genotyping technologies, immense data can be ch...
As cost and throughput of second-generation sequencers continue to improve, even modestly resourced ...
Background: Distributed approaches based on the MapReduce programming paradigm have started to be pr...
Distributed approaches based on the MapReduce programming paradigm have started to be proposed in th...
The rise of next-generation sequencing has produced an abundance of data with almost limitless analy...
A revolution in personalized genomics will occur when scientists can sequence genomes of millions of...
An organism’s DNA sequence is a virtual cornucopia of information, and sequencing technology is the ...
Unexpected growth of high-throughput sequencing platforms in recent years impacted virtually all are...
Bioinformatic analyses generally involve passing genetic sequence data through a pipeline of transfo...
The dramatic progress in DNA sequencing technology over the last decade, with the revolutionary int...
The explosive growth in biological sequence data coupled with the design and deployment of increasin...
International audienceWith High Throughput Sequencing (HTS) technologies, biology is experiencing a ...
We present research on the design, development and application of algorithms for DNA sequence analys...
[EN]The advent of big data and advanced genomic sequencing technologies has presented challenges in ...
AbstractMotivationInformation theoretic and compositional/linguistic analysis of genomes have a cent...
With the development of high-throughput and low-cost genotyping technologies, immense data can be ch...