Semistatic word-based byte-oriented compression codes are known to be attractive alternatives to compress natural language texts. With compression ratios around 30%, they allow direct pattern searching on the compressed text up to 8 times faster than on its uncompressed version. In this paper we reveal that these compressors have even more benefits. We show that most of the state-of-the-art compressors such as the block-wise bzip2, those from the Ziv-Lempel family, and the predictive ppm-based ones, can ben-efit from compressing not the original text, but its compressed representation obtained by a word-based byte-oriented statistical compressor. In particular, our experimental results show that using Dense-Code-based compression as a prepr...
Abstract. We present a technique to build an index based on sux arrays for compressed texts. We also...
Dictionary-based compression algorithms include a parsing strategy to transform the input text into ...
Classic textual compression methods work over the alphabet of characters or alphabet of words. For l...
Semistatic word-based byte-oriented compressors are known to be attractive alternatives to compress ...
Semistatic byte-oriented word-based compression codes have been shown to be an attractive alternativ...
This thesis is an exploration of hybrid dictionary/statistical algorithms for compressing textual in...
The use of a text processing algorithm that can improve the compression ratio of standard data compr...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
This work presents (s, c)-Dense Code, a new method for compressing natural language texts. This tec...
We address the problem of adaptive compression of natural language text, focusing on the case where ...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
The best general-purpose compression schemes make their gains by estimating a probability distributi...
EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit l...
We report on a new experimental analysis of high-order entropy-compressed suffix arrays, which retai...
In this Ph. D. Thesis we investigate several data compression methods on text in natural language. O...
Abstract. We present a technique to build an index based on sux arrays for compressed texts. We also...
Dictionary-based compression algorithms include a parsing strategy to transform the input text into ...
Classic textual compression methods work over the alphabet of characters or alphabet of words. For l...
Semistatic word-based byte-oriented compressors are known to be attractive alternatives to compress ...
Semistatic byte-oriented word-based compression codes have been shown to be an attractive alternativ...
This thesis is an exploration of hybrid dictionary/statistical algorithms for compressing textual in...
The use of a text processing algorithm that can improve the compression ratio of standard data compr...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
This work presents (s, c)-Dense Code, a new method for compressing natural language texts. This tec...
We address the problem of adaptive compression of natural language text, focusing on the case where ...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
The best general-purpose compression schemes make their gains by estimating a probability distributi...
EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit l...
We report on a new experimental analysis of high-order entropy-compressed suffix arrays, which retai...
In this Ph. D. Thesis we investigate several data compression methods on text in natural language. O...
Abstract. We present a technique to build an index based on sux arrays for compressed texts. We also...
Dictionary-based compression algorithms include a parsing strategy to transform the input text into ...
Classic textual compression methods work over the alphabet of characters or alphabet of words. For l...