Relative compression, where a set of similar strings are compressed with respect to a reference string, is a very effective method of compress-ing DNA datasets containing multiple similar sequences. Relative com-pression is fast to perform and also supports rapid random access to the underlying data. The main difficulty of relative compression is in selecting an appropriate reference sequence. In this paper, we explore using the dic-tionary of repeats generated by Comrad, Re-pair and Dna-x algorithms as reference sequences for relative compression. We show this technique allows better compression and supports random access just as well. The technique also allows more general repetitive datasets to be compressed using relative compression.
Rapid advancements in research in the field of DNA sequence discovery has led to a vast range of com...
Data Storage costs have an appreciable proportion of total cost in the creation and analysis of DNA ...
compression algorithms: the case of approximate tandem repeats in DNA sequences E.Rivals14, O.Delgra...
Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recentl...
Classical DNA sequence compression algorithms consider only intra-sequence similarity, i.e., similar...
The increasing volume of biological data requires finding new ways to save these data in genetic ban...
xiii, 115 p. : ill. (some col.) ; 30 cm.PolyU Library Call No.: [THS] LG51 .H577M EIE 2009 WuDeoxyri...
Motivation: Storing, transferring, and maintaining genomic databa-ses becomes a major challenge beca...
With increasing number of DNA sequences being discovered the problem of storing and using genomic da...
Abstract Modern high-throughput sequencing technologies are able to generate DNA sequences at an ev...
Current genomic data compression techniques rely on referent sequence. This means that one sequence...
The success of high-throughput sequencing has lead to an increasing number of projects which sequenc...
The success of high-throughput sequencing has lead to an increasing number of projects which sequenc...
The decreasing costs of genome sequencing is creating a demand for scalable storage and processing t...
In bio-sequence repositories and other applications, like for instance in the production of a Cd-rom...
Rapid advancements in research in the field of DNA sequence discovery has led to a vast range of com...
Data Storage costs have an appreciable proportion of total cost in the creation and analysis of DNA ...
compression algorithms: the case of approximate tandem repeats in DNA sequences E.Rivals14, O.Delgra...
Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recentl...
Classical DNA sequence compression algorithms consider only intra-sequence similarity, i.e., similar...
The increasing volume of biological data requires finding new ways to save these data in genetic ban...
xiii, 115 p. : ill. (some col.) ; 30 cm.PolyU Library Call No.: [THS] LG51 .H577M EIE 2009 WuDeoxyri...
Motivation: Storing, transferring, and maintaining genomic databa-ses becomes a major challenge beca...
With increasing number of DNA sequences being discovered the problem of storing and using genomic da...
Abstract Modern high-throughput sequencing technologies are able to generate DNA sequences at an ev...
Current genomic data compression techniques rely on referent sequence. This means that one sequence...
The success of high-throughput sequencing has lead to an increasing number of projects which sequenc...
The success of high-throughput sequencing has lead to an increasing number of projects which sequenc...
The decreasing costs of genome sequencing is creating a demand for scalable storage and processing t...
In bio-sequence repositories and other applications, like for instance in the production of a Cd-rom...
Rapid advancements in research in the field of DNA sequence discovery has led to a vast range of com...
Data Storage costs have an appreciable proportion of total cost in the creation and analysis of DNA ...
compression algorithms: the case of approximate tandem repeats in DNA sequences E.Rivals14, O.Delgra...