We describe a compression technique for semistructured documents, called SCMPPM, which combines the Predic-tion by Partial Matching technique with Structural Contexts Model (SCM) technique. SCMPPM takes advantage of the context information usually implicit in the structure of the text. The idea is to use a separate PPM model to compress the text that lies inside each different structure type (e.g., different XML tag). The intuition is that the distribution of the texts that belong to a given structure type should be similar, and different from that of other structure types. This should allow PPM to make better predictions. We test our idea against plain PPM modelling, as well as against other structure-aware techniques. Results show that th...
Query performance issues over semi-structured data have led to the emergence of materialised XML vie...
This work concerns the search for text compressors that compress better than existing dictionary cod...
We describe a technique that allows end-users to specify automated transformations of structured tex...
EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit l...
New methods of acquiring structural information in text documents may support better compression by ...
This thesis is dedicated to analysis of context-based compression methods, their characteristics and...
This paper takes a compression scheme that infers a hierarchical grammar from its input, and investi...
Many computer files contain highly-structured, predictable information interspersed with information...
Publicación ISIThe authors describe Lempel-Ziv to Compress Structure (LZCS), a novel Lempel-Ziv appr...
Much research has been undertaken in order to speed up the processing of semistructured data in gene...
Abstract. Sharing of common subtrees has been reported useful not only for XML compression but also ...
Query performance issues over semi-structured data have led to the emergence of materialised XML vie...
In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matchi...
Context modeling has emerged as the most promising new approach to com-pressing text. While context-...
Nozīmīga daļa no elektroniski pieejamajiem dokumentiem tiek klasificēti kā daļēji strukturēti dokume...
Query performance issues over semi-structured data have led to the emergence of materialised XML vie...
This work concerns the search for text compressors that compress better than existing dictionary cod...
We describe a technique that allows end-users to specify automated transformations of structured tex...
EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit l...
New methods of acquiring structural information in text documents may support better compression by ...
This thesis is dedicated to analysis of context-based compression methods, their characteristics and...
This paper takes a compression scheme that infers a hierarchical grammar from its input, and investi...
Many computer files contain highly-structured, predictable information interspersed with information...
Publicación ISIThe authors describe Lempel-Ziv to Compress Structure (LZCS), a novel Lempel-Ziv appr...
Much research has been undertaken in order to speed up the processing of semistructured data in gene...
Abstract. Sharing of common subtrees has been reported useful not only for XML compression but also ...
Query performance issues over semi-structured data have led to the emergence of materialised XML vie...
In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matchi...
Context modeling has emerged as the most promising new approach to com-pressing text. While context-...
Nozīmīga daļa no elektroniski pieejamajiem dokumentiem tiek klasificēti kā daļēji strukturēti dokume...
Query performance issues over semi-structured data have led to the emergence of materialised XML vie...
This work concerns the search for text compressors that compress better than existing dictionary cod...
We describe a technique that allows end-users to specify automated transformations of structured tex...