In computational linguistics, large tree databases tagged with morpho-syntactic information are in need of fast retrieval of multiway tree structures. To tackle this problem, we present a generalization of the classical n-gram indexing technique called Treegram indexing. As an application of treegram indexing, we describe the Venona retrieval system, which handles the BH t treebank containing 508,650 phrase structure trees. 1 Tree Retrieval Multiway trees (MT, henceforth) play a central role in representing complex linguistic information because they are a common and well-understood data structure for describing hierarchical information. With the availability of large treebanks, retrieval techniques for highly structured data now become ess...
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency T...
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency T...
Discovering frequent structures within large natural language corpora is one of the core problems of...
We present a generalization of the classical n-gram indexing technique called Treegram indexing, ...
This is a pre-print of a paper from Human Language Technologies: Proceedings of the 11th Annual Conf...
There has been recent interest in looking at what is required for a tree query language for linguis-...
Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how s...
Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how s...
The amount of data that is available for research grows rapidly, yet technology to efficiently inter...
Treebanks constitute a valuable resource for linguists, but their usefulness is often reduced by har...
Databases of hierarchically annotated text occupy a central place in linguistic research and languag...
The amount of data that is available for research grows rapidly, yet technology to efficiently inter...
• increasing demand for parallel treebanks worldwide • other projects: mostly focused on machine tra...
This paper discusses the construction of a parallel treebank currently involving ten languages from ...
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency T...
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency T...
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency T...
Discovering frequent structures within large natural language corpora is one of the core problems of...
We present a generalization of the classical n-gram indexing technique called Treegram indexing, ...
This is a pre-print of a paper from Human Language Technologies: Proceedings of the 11th Annual Conf...
There has been recent interest in looking at what is required for a tree query language for linguis-...
Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how s...
Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how s...
The amount of data that is available for research grows rapidly, yet technology to efficiently inter...
Treebanks constitute a valuable resource for linguists, but their usefulness is often reduced by har...
Databases of hierarchically annotated text occupy a central place in linguistic research and languag...
The amount of data that is available for research grows rapidly, yet technology to efficiently inter...
• increasing demand for parallel treebanks worldwide • other projects: mostly focused on machine tra...
This paper discusses the construction of a parallel treebank currently involving ten languages from ...
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency T...
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency T...
Three sides existed whose connection is solved in this thesis. First, it was the Prague Dependency T...
Discovering frequent structures within large natural language corpora is one of the core problems of...