Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and metadata schemes of language corpora grow rapidly, performing complex searches becomes a highly computational expensive task. We evaluate several storage models and indexing variants in two multi-processor/multi-core environments, focusing on prototypical linguistic querying scenarios. Our aim is to reveal modeling and querying tendencies – rather than absolute benchmark results – when using a relational database management system (RDBMS) and MapReduce for natural language corpus retrieval. Based on these findings, we are going to improve our approach for the efficient exploitation of very large corpora, combining advantages of state-of-t...
Recent years have seen an increased interest in and availability of many different kinds of corpora....
N-gram language models are an essential component in statistical natural language processing systems...
(Please send correspondences to Cheng Hsu.) A truly natural language interface to databases also nee...
Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and ...
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art ap...
LexiDB is a tool for storing, managing and querying corpus data. In contrast to other database manag...
We present an approach for searching and exploring translation variants of multi-word units in large...
The availability of large multi-parallel corpora offers an enormous wealth of material to contrastiv...
The availability of large multi-parallel corpora offers an enormous wealth of material to contrastiv...
Humanities researchers are producing large volumes and heterogeneous varieties of language and liter...
This paper proposes a simple mechanism for supporting multiple overlapping layers of annotations for...
This thesis presents the patterns and methods uncovered in the development of a new scalable corpus ...
Large and open multiparallel corpora are a valuable resource for contrastive corpus linguists if the...
The need for efficient corpus indexing and querying arises frequently both in machine learning-based...
Recent years have seen an increased interest in and availability of parallel corpora. Large corpora ...
Recent years have seen an increased interest in and availability of many different kinds of corpora....
N-gram language models are an essential component in statistical natural language processing systems...
(Please send correspondences to Cheng Hsu.) A truly natural language interface to databases also nee...
Linguistic query systems are special purpose IR applications. As text sizes, annotation layers, and ...
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art ap...
LexiDB is a tool for storing, managing and querying corpus data. In contrast to other database manag...
We present an approach for searching and exploring translation variants of multi-word units in large...
The availability of large multi-parallel corpora offers an enormous wealth of material to contrastiv...
The availability of large multi-parallel corpora offers an enormous wealth of material to contrastiv...
Humanities researchers are producing large volumes and heterogeneous varieties of language and liter...
This paper proposes a simple mechanism for supporting multiple overlapping layers of annotations for...
This thesis presents the patterns and methods uncovered in the development of a new scalable corpus ...
Large and open multiparallel corpora are a valuable resource for contrastive corpus linguists if the...
The need for efficient corpus indexing and querying arises frequently both in machine learning-based...
Recent years have seen an increased interest in and availability of parallel corpora. Large corpora ...
Recent years have seen an increased interest in and availability of many different kinds of corpora....
N-gram language models are an essential component in statistical natural language processing systems...
(Please send correspondences to Cheng Hsu.) A truly natural language interface to databases also nee...