Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full resolution. Since our approach relies directly on redun...
With high throughput DNA sequencing costs dropping below $1000 for human genomes, data storage, retr...
Motivation: The past decade has seen the introduction of new technologies that lowered the cost of g...
It is becoming increasingly impractical to indefinitely store raw sequencing data for later processi...
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing th...
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing th...
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing th...
Motivation: The past decade has seen the introduction of new technologies that significantly lowered...
Abstract—Recent advancements in sequencing tech-nology have led to a drastic reduction in the cost o...
To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant ...
Current NGS techniques are becoming exponentially cheaper. As a result, there is an exponential grow...
Massive amounts of sequencing data are being generated thanks to advances in sequencing technology a...
High-throughput sequencing of DNA molecules has revolutionized biomedical research by enabling the q...
Motivation: The growth of next-generation sequencing means that more effective and efficient archivi...
The intensive research interest in studying genomes has led to a series of advances in DNA sequencin...
DNA sequencing is the process of determining the ordered sequence of the four nucleotide bases in a ...
With high throughput DNA sequencing costs dropping below $1000 for human genomes, data storage, retr...
Motivation: The past decade has seen the introduction of new technologies that lowered the cost of g...
It is becoming increasingly impractical to indefinitely store raw sequencing data for later processi...
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing th...
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing th...
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing th...
Motivation: The past decade has seen the introduction of new technologies that significantly lowered...
Abstract—Recent advancements in sequencing tech-nology have led to a drastic reduction in the cost o...
To the Editor: Most next-generation sequencing (NGS) quality scores are space intensive, redundant ...
Current NGS techniques are becoming exponentially cheaper. As a result, there is an exponential grow...
Massive amounts of sequencing data are being generated thanks to advances in sequencing technology a...
High-throughput sequencing of DNA molecules has revolutionized biomedical research by enabling the q...
Motivation: The growth of next-generation sequencing means that more effective and efficient archivi...
The intensive research interest in studying genomes has led to a series of advances in DNA sequencin...
DNA sequencing is the process of determining the ordered sequence of the four nucleotide bases in a ...
With high throughput DNA sequencing costs dropping below $1000 for human genomes, data storage, retr...
Motivation: The past decade has seen the introduction of new technologies that lowered the cost of g...
It is becoming increasingly impractical to indefinitely store raw sequencing data for later processi...