Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, int...
This work introduces a companion reproducible paper with the aim of allowing the exact replication o...
Let D={T1,T2,…,TD} be a collection of D documents having n characters in total. Given two patterns P...
AbstractWe give new space/time tradeoffs for compressed indexes that answer document retrieval queri...
Most of the fastest-growing string collections today are repetitive, that is, most of the constituen...
Abstract. Document retrieval aims at finding the most important doc-uments where a pattern appears i...
We consider document listing on string collections, that is, finding in which strings a given patter...
We consider document listing on string collections, that is, finding in which strings a given patter...
[[abstract]]In the document retrieval problem [9], we are given a collection of documents (strings) ...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
Abstract. Given a collection of strings (called documents), the top-k document retrieval problem is ...
Abstract. We address the problem of counting the number of strings in a collection where a given pat...
We address the problem of indexing a collectionD = {T1,T2,...TD} of D string documents of total leng...
Indexing highly repetitive collections has become a relevant problem with the emergence of large rep...
Highly repetitive strings are increasingly being amassed by genome sequencing experiments, and by ve...
This work introduces a companion reproducible paper with the aim of allowing the exact replication o...
Let D={T1,T2,…,TD} be a collection of D documents having n characters in total. Given two patterns P...
AbstractWe give new space/time tradeoffs for compressed indexes that answer document retrieval queri...
Most of the fastest-growing string collections today are repetitive, that is, most of the constituen...
Abstract. Document retrieval aims at finding the most important doc-uments where a pattern appears i...
We consider document listing on string collections, that is, finding in which strings a given patter...
We consider document listing on string collections, that is, finding in which strings a given patter...
[[abstract]]In the document retrieval problem [9], we are given a collection of documents (strings) ...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
Abstract. Given a collection of strings (called documents), the top-k document retrieval problem is ...
Abstract. We address the problem of counting the number of strings in a collection where a given pat...
We address the problem of indexing a collectionD = {T1,T2,...TD} of D string documents of total leng...
Indexing highly repetitive collections has become a relevant problem with the emergence of large rep...
Highly repetitive strings are increasingly being amassed by genome sequencing experiments, and by ve...
This work introduces a companion reproducible paper with the aim of allowing the exact replication o...
Let D={T1,T2,…,TD} be a collection of D documents having n characters in total. Given two patterns P...
AbstractWe give new space/time tradeoffs for compressed indexes that answer document retrieval queri...