In this Ph. D. Thesis we investigate several data compression methods on text in natural language. Our study is focused on algorithms that use the word as the basic units, they are usually called word-based text compression algorithms.We have developed algorithms that allow to divide original size of the text by an average factor of 3. 5 and keeps (medium an index) direct access to the compressed form of the text.The set of words of a text, (the lexicon) is not a priori known. An efficient compression of the text requires an efficient compression of its lexicon. For this purpose, we have developed a compact representation of the lexicon that allows, by the application of Markov chain based compression algorithms, to get very high compressio...
This thesis in text algorithmics studies the compression, indexation and querying on a labeled text}...
In this paper, we present a text compression technique which utilises morpheme-based text compressio...
The compression of texts written in natural language can exploit information about its linguistic ch...
In this Ph. D. Thesis we investigate several data compression methods on text in natural language. O...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
Les algorithmes de compression de données basés sur les dictionnaires incluent une stratégie de pars...
Nous présentons deux algorithmes de compression de texte qui considèrent celui-ci comme une suite al...
This research article presents a new efficient lossless text compression algorithm based on an exist...
International audienceDu point de vue informatique, un texte est formé par une suite de caractères a...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
Nous étudions dans cet article plusieurs possibilités d'adapter l'algorithme de compression de donné...
Semistatic word-based byte-oriented compression codes are known to be attractive alternatives to com...
This thesis is an exploration of hybrid dictionary/statistical algorithms for compressing textual in...
Dictionary-based compression algorithms include a parsing strategy to transform the input text into ...
Semistatic word-based byte-oriented compressors are known to be attractive alternatives to compress ...
This thesis in text algorithmics studies the compression, indexation and querying on a labeled text}...
In this paper, we present a text compression technique which utilises morpheme-based text compressio...
The compression of texts written in natural language can exploit information about its linguistic ch...
In this Ph. D. Thesis we investigate several data compression methods on text in natural language. O...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
Les algorithmes de compression de données basés sur les dictionnaires incluent une stratégie de pars...
Nous présentons deux algorithmes de compression de texte qui considèrent celui-ci comme une suite al...
This research article presents a new efficient lossless text compression algorithm based on an exist...
International audienceDu point de vue informatique, un texte est formé par une suite de caractères a...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
Nous étudions dans cet article plusieurs possibilités d'adapter l'algorithme de compression de donné...
Semistatic word-based byte-oriented compression codes are known to be attractive alternatives to com...
This thesis is an exploration of hybrid dictionary/statistical algorithms for compressing textual in...
Dictionary-based compression algorithms include a parsing strategy to transform the input text into ...
Semistatic word-based byte-oriented compressors are known to be attractive alternatives to compress ...
This thesis in text algorithmics studies the compression, indexation and querying on a labeled text}...
In this paper, we present a text compression technique which utilises morpheme-based text compressio...
The compression of texts written in natural language can exploit information about its linguistic ch...