The task of finding a criterion allowing to distinguish a text from an arbitrary set of words is rather relevant in itself, for instance, in the aspect of development of means for internet-content indexing [1] or separating signals and noise in communication channels [2]. The Zipf law is currently considered to be the most reliable criterion of this kind [3]. At any rate, conventional stochastic word sets do not meet this law. The present paper deals with one of possible criteria based on the determination of the degree of data compression. The most natural approach to solving the above problem is, no doubt, a study of autocorrelations in sequences of words forming a document. Data compression used in various file compression systems, in pa...
In this paper we focus on the problem of compressed pattern matching for the text compression using...
10th International Conference on Electronics, Computer and Computation (ICECCO) -- NOV 07-09, 2013 -...
Data Compression may be defined as the science and art of the representation of information in a cri...
The present chapter describes a few standard algorithms used for processing texts. They apply, for.....
International audienceZipf’s law has intrigued people for a long time. This distribution models a ce...
The best general-purpose compression schemes make their gains by estimating a probability distributi...
Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The stru...
the sequence. It gives us all text positions of each duplicated pattern. ffl The program chooses ea...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Surveys techniques that solve the two basic problems of efficiency (in storage and computation) at t...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
More than 40 different schemes for performing text compression have been proposed in the literature....
In his work on the information content of English text in 1951, Shannon described a method of recodi...
Dictionary-based compression schemes are the most commonly used data compression schemes since they ...
In this paper we focus on the problem of compressed pattern matching for the text compression using...
10th International Conference on Electronics, Computer and Computation (ICECCO) -- NOV 07-09, 2013 -...
Data Compression may be defined as the science and art of the representation of information in a cri...
The present chapter describes a few standard algorithms used for processing texts. They apply, for.....
International audienceZipf’s law has intrigued people for a long time. This distribution models a ce...
The best general-purpose compression schemes make their gains by estimating a probability distributi...
Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The stru...
the sequence. It gives us all text positions of each duplicated pattern. ffl The program chooses ea...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
Surveys techniques that solve the two basic problems of efficiency (in storage and computation) at t...
Written text is one of the fundamental manifestations of human language, and the study of its univer...
An algorithm for very efficient compression of a set of natural language text files is presented. No...
More than 40 different schemes for performing text compression have been proposed in the literature....
In his work on the information content of English text in 1951, Shannon described a method of recodi...
Dictionary-based compression schemes are the most commonly used data compression schemes since they ...
In this paper we focus on the problem of compressed pattern matching for the text compression using...
10th International Conference on Electronics, Computer and Computation (ICECCO) -- NOV 07-09, 2013 -...
Data Compression may be defined as the science and art of the representation of information in a cri...