Abstract. We address the problem of counting the number of strings in a collection where a given pattern appears, which has applications in information retrieval and data mining. Existing solutions are in a theoretical stage. We implement these solutions and develop some new variants, comparing them experimentally on various datasets. Our results not only show which are the best options for each situation and help discard practically unappealing solutions, but also uncover some unexpected compressibility properties of the best data structures. By taking advantage of these properties, we can reduce the size of the structures by a factor of 5–400, depending on the dataset
Highly repetitive strings are increasingly being amassed by genome sequencing experiments, and by ve...
We find generating functions for the number of strings (words) containing a specified number of occu...
International audienceCounting the number of times a pattern occurs in a database is a fundamental d...
We consider document listing on string collections, that is, finding in which strings a given patter...
Most of the fastest-growing string collections today are repetitive, that is, most of the constituen...
We consider document listing on string collections, that is, finding in which strings a given patter...
Abstract. Document retrieval aims at finding the most important doc-uments where a pattern appears i...
Most of the fastest-growing string collections today are repetitive, that is, most of the constituen...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
Model counting is the problem of determining the number of so-lutions that satisfy a given set of co...
Abstract. Motivated by the problem of counting unique visitors to a website, we consider how to prep...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
In this paper we study the problem of estimating the number of occurrences of substrings in textual ...
AbstractSuffix arrays are used in various applications and research areas like data compression or c...
Schürmann K-B, Stoye J. Counting suffix arrays and strings. THEORETICAL COMPUTER SCIENCE. 2008;395(2...
Highly repetitive strings are increasingly being amassed by genome sequencing experiments, and by ve...
We find generating functions for the number of strings (words) containing a specified number of occu...
International audienceCounting the number of times a pattern occurs in a database is a fundamental d...
We consider document listing on string collections, that is, finding in which strings a given patter...
Most of the fastest-growing string collections today are repetitive, that is, most of the constituen...
We consider document listing on string collections, that is, finding in which strings a given patter...
Abstract. Document retrieval aims at finding the most important doc-uments where a pattern appears i...
Most of the fastest-growing string collections today are repetitive, that is, most of the constituen...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
Model counting is the problem of determining the number of so-lutions that satisfy a given set of co...
Abstract. Motivated by the problem of counting unique visitors to a website, we consider how to prep...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
In this paper we study the problem of estimating the number of occurrences of substrings in textual ...
AbstractSuffix arrays are used in various applications and research areas like data compression or c...
Schürmann K-B, Stoye J. Counting suffix arrays and strings. THEORETICAL COMPUTER SCIENCE. 2008;395(2...
Highly repetitive strings are increasingly being amassed by genome sequencing experiments, and by ve...
We find generating functions for the number of strings (words) containing a specified number of occu...
International audienceCounting the number of times a pattern occurs in a database is a fundamental d...