This thesis studies document signatures, which are small representations of documents and other objects that can be stored compactly and compared for similarity. This research finds that document signatures can be effectively and efficiently used to both search and understand relationships between documents in large collections, scalable enough to search a billion documents in a fraction of a second. Deliverables arising from the research include an investigation of the representational capacity of document signatures, the publication of an open-source signature search platform and an approach for scaling signature retrieval to operate efficiently on collections containing hundreds of millions of documents
An end-to-end architecture for multi-script document retrieval using handwritten signatures is propo...
Abstract—Nearest Neighbor Search for similar document retrieval suffers from an efficiency problem w...
Part 1: THEMES AND ISSUESInternational audienceThe relentless increase in storage capacity and decre...
This paper describes a new method of indexing and search-ing large binary signature collections to e...
Signature files are extremely compressed versions of text files which can be used as access or index...
Summarization: In this paper we study a variation of the signature file access method for text and a...
A signature file organization, called the weight-partitioned signature file, for supporting document...
This work fulfills sublinear time Near-est Neighbor Search (NNS) in massive-scale document collectio...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
Signature file has been shown to be a very good access method for text and multiattribute retrieval ...
Signature files are one technique for indexing documents for full-text retrieval systems. This paper...
Signature files seem to be a promising access method for text and attributes. According to this meth...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
AbstractWe show how to learn a deep graphical model of the word-count vectors obtained from a large ...
We use the signature file method to search for partially specified terms in large lexicons. To optim...
An end-to-end architecture for multi-script document retrieval using handwritten signatures is propo...
Abstract—Nearest Neighbor Search for similar document retrieval suffers from an efficiency problem w...
Part 1: THEMES AND ISSUESInternational audienceThe relentless increase in storage capacity and decre...
This paper describes a new method of indexing and search-ing large binary signature collections to e...
Signature files are extremely compressed versions of text files which can be used as access or index...
Summarization: In this paper we study a variation of the signature file access method for text and a...
A signature file organization, called the weight-partitioned signature file, for supporting document...
This work fulfills sublinear time Near-est Neighbor Search (NNS) in massive-scale document collectio...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
Signature file has been shown to be a very good access method for text and multiattribute retrieval ...
Signature files are one technique for indexing documents for full-text retrieval systems. This paper...
Signature files seem to be a promising access method for text and attributes. According to this meth...
Given a collection of strings, document listing refers to the problem of finding all the strings (or...
AbstractWe show how to learn a deep graphical model of the word-count vectors obtained from a large ...
We use the signature file method to search for partially specified terms in large lexicons. To optim...
An end-to-end architecture for multi-script document retrieval using handwritten signatures is propo...
Abstract—Nearest Neighbor Search for similar document retrieval suffers from an efficiency problem w...
Part 1: THEMES AND ISSUESInternational audienceThe relentless increase in storage capacity and decre...