In a text retrieval community, many researchers have shown a good quality of searching a current snapshot of the Web. However, only a small number have demonstrated a good quality of searching a long-term archival domain, where doc-uments are preserved for a long time, i.e., ten years or more. In such a domain, a search application is not only applicable for archivists or historians, but also in a context of national library and enterprise search (searching document reposito-ries, emails, etc.). In the rest of this paper, we will explain three problems of searching document archives and propose possible approaches to solve these problems. Our main re-search question is: How to improve the quality of search in a document archive using tempor...
The availability of versioned text collections such as the Internet Archive opens up opportunities...
The essential quality of information in a digital library is accessibility. Full text search is not ...
Although the problems of optical character recognition for contemporary printed text have been resol...
Web archives include both archives of contents originally published on the Web (e.g., the Internet A...
The Web has become the main publication medium world-wide, covering almost every facet of human acti...
There have been numerous efforts recently to digitize previously published content and preserving bo...
Time-travel text search enriches standard text search by temporal predicates, so that users of web a...
In this thesis, we address major challenges in searching temporal document collections. In such coll...
A number of emerging large scale applications such as web archiving and time-stamped web objects ge...
Web archives include both archives of contents originally published on the Web (e.g., the Internet A...
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciê...
Text search over temporally versioned document collections such as web archives has received little ...
A Web archive usually contains multiple versions of documents crawled from the Web at different poin...
An unprecedented amount of information encompassing almost every facet of human activities across th...
Getting an overview of a historic entity or event can be difficult in search results, especially if ...
The availability of versioned text collections such as the Internet Archive opens up opportunities...
The essential quality of information in a digital library is accessibility. Full text search is not ...
Although the problems of optical character recognition for contemporary printed text have been resol...
Web archives include both archives of contents originally published on the Web (e.g., the Internet A...
The Web has become the main publication medium world-wide, covering almost every facet of human acti...
There have been numerous efforts recently to digitize previously published content and preserving bo...
Time-travel text search enriches standard text search by temporal predicates, so that users of web a...
In this thesis, we address major challenges in searching temporal document collections. In such coll...
A number of emerging large scale applications such as web archiving and time-stamped web objects ge...
Web archives include both archives of contents originally published on the Web (e.g., the Internet A...
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciê...
Text search over temporally versioned document collections such as web archives has received little ...
A Web archive usually contains multiple versions of documents crawled from the Web at different poin...
An unprecedented amount of information encompassing almost every facet of human activities across th...
Getting an overview of a historic entity or event can be difficult in search results, especially if ...
The availability of versioned text collections such as the Internet Archive opens up opportunities...
The essential quality of information in a digital library is accessibility. Full text search is not ...
Although the problems of optical character recognition for contemporary printed text have been resol...