Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is b, the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing b is NP-complete, a popular gold standard is z, the number of phrases in the Lempel-Ziv parse of the text, which is computed in linear time and yields the least number of phrases when those can be copied only from the left. Almost nothing has been known for decades about the approximation ratio of z with respect to b. In this paper we prove that z = O(b log(n/b)), where n is the text length. We also show that the bound is ti...
Abstract. In this paper we investigate the problem of building a static data structure that represen...
Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into var...
Can we analyze data without decompressing it? As our data keeps growing, understanding the time comp...
Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well u...
Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repet...
We show that a wide class of dictionary compression methods (including LZ77, LZ78, grammar compresso...
Lempel–Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repet...
While the k th order empirical entropy is an accepted measure of the compressibility of individual s...
A well-known fact in the field of lossless text compression is that high-order entropy is a weak mod...
Several recently-proposed data compression algorithms are based on the idea of representing a string...
Dictionary-based compression schemes are the most commonly used data compression schemes since they ...
© 1963-2012 IEEE. Irreducible grammars are a class of context-free grammars with well-known represen...
The goal of this contribution is twofold: (i) to introduce a generalized Lempel-Ziv parsing scheme, ...
Abstract—This paper addresses the smallest grammar problem: What is the smallest context-free gramma...
We investigate two closely related LZ78-based compression schemes: LZMW (an old scheme by Miller and...
Abstract. In this paper we investigate the problem of building a static data structure that represen...
Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into var...
Can we analyze data without decompressing it? As our data keeps growing, understanding the time comp...
Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well u...
Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repet...
We show that a wide class of dictionary compression methods (including LZ77, LZ78, grammar compresso...
Lempel–Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repet...
While the k th order empirical entropy is an accepted measure of the compressibility of individual s...
A well-known fact in the field of lossless text compression is that high-order entropy is a weak mod...
Several recently-proposed data compression algorithms are based on the idea of representing a string...
Dictionary-based compression schemes are the most commonly used data compression schemes since they ...
© 1963-2012 IEEE. Irreducible grammars are a class of context-free grammars with well-known represen...
The goal of this contribution is twofold: (i) to introduce a generalized Lempel-Ziv parsing scheme, ...
Abstract—This paper addresses the smallest grammar problem: What is the smallest context-free gramma...
We investigate two closely related LZ78-based compression schemes: LZMW (an old scheme by Miller and...
Abstract. In this paper we investigate the problem of building a static data structure that represen...
Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into var...
Can we analyze data without decompressing it? As our data keeps growing, understanding the time comp...