In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matching (PPM) compression algorithm. This achieves significantly better compression for different natural language texts compared to other well-known compression methods. Our method first generates a grammar based on the most common two-character sequences (bigraphs) or three-character sequences (trigraphs) in the text being compressed and then substitutes these sequences using the respective non-terminal symbols defined by the grammar in a pre-processing phase prior to the compression. This leads to significantly improved results in compression for various natural languages (a 5% improvement for American English, 10% for British English, 29% for ...
This article makes several improvements to the classic PPM algorithm, resulting in a new algorithm w...
IEEE Computer SocietyITCC 2005 - International Conference on Information Technology: Coding and Comp...
Grammar-based compression is a well-studied technique to construct a context-free grammar (CFG) deri...
In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matchi...
In this paper, we introduce several new universal pre-processing techniques to improve Prediction by...
196 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We then turn to construction ...
This paper introduces a simple dynamic pro-gramming algorithm for performing text pre-diction. The a...
The best general-purpose compression schemes make their gains by estimating a probability distributi...
Large alphabet languages such as Chinese present different problems for language modelling compared ...
Abstract. There is a close relationship between formal language theory and data compression. Since 1...
We give a detailed algorithm for fast text compression. Our algorithm, related to the PPM method, si...
This work concerns the search for text compressors that compress better than existing dictionary cod...
The PPM (Prediction by Partial Matching) family of text compression algorithms has several members t...
The on-line sequence modelling algorithm `Prediction by Partial Matching ' (PPM) has set the pe...
Alignment of parallel corpora is a crucial step prior to training statistical language models for ma...
This article makes several improvements to the classic PPM algorithm, resulting in a new algorithm w...
IEEE Computer SocietyITCC 2005 - International Conference on Information Technology: Coding and Comp...
Grammar-based compression is a well-studied technique to construct a context-free grammar (CFG) deri...
In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matchi...
In this paper, we introduce several new universal pre-processing techniques to improve Prediction by...
196 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.We then turn to construction ...
This paper introduces a simple dynamic pro-gramming algorithm for performing text pre-diction. The a...
The best general-purpose compression schemes make their gains by estimating a probability distributi...
Large alphabet languages such as Chinese present different problems for language modelling compared ...
Abstract. There is a close relationship between formal language theory and data compression. Since 1...
We give a detailed algorithm for fast text compression. Our algorithm, related to the PPM method, si...
This work concerns the search for text compressors that compress better than existing dictionary cod...
The PPM (Prediction by Partial Matching) family of text compression algorithms has several members t...
The on-line sequence modelling algorithm `Prediction by Partial Matching ' (PPM) has set the pe...
Alignment of parallel corpora is a crucial step prior to training statistical language models for ma...
This article makes several improvements to the classic PPM algorithm, resulting in a new algorithm w...
IEEE Computer SocietyITCC 2005 - International Conference on Information Technology: Coding and Comp...
Grammar-based compression is a well-studied technique to construct a context-free grammar (CFG) deri...