The role of the Web for text corpus construction is becoming increasingly significant. However, the contribution of the Web is largely confined to building a general virtual corpus or low quality specialised corpora. In this paper, we introduce a new technique called SPARTAN for constructing specialised corpora from the Web by systematically analysing website contents. Our evaluations show that the corpora constructed using our technique are independent of the search engines employed. In particular, SPARTAN-derived corpora outperform all corpora based on existing techniques for the task of term recognition
Corpus-based terminology can be described as a working method which consists in exploring a domain-s...
Corpus-based terminology can be described as a working method which consists in exploring a domain-s...
The paper compares systematically the utility of specially-made text corpora and the textual resourc...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenie...
This paper describes preliminary work in corpus-based indexing of a sizeable specialized Web portal,...
Encyclopedias, which describe general/technical terms, are valuable language resources (LRs). As wit...
The Web is an inexhaustible reservoir of machine-readable texts in most of the world’s written langu...
Over the last decade, methods of web corpus construction and the evaluation of web corpora have been...
We present a web service for quickly producing corpora for specialist areas, in any of a range of la...
Since the machines become more and more intelligent, it is reasonable to expect the automatic constr...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
This paper addresses the problem of categorizing terms or lexical entities into a predefined set of ...
International audienceThe conventional tools of the "web as corpus" framework rely heavily on URLs o...
Abstract. In corpus-based lexicography and natural language processing fields some authors have prop...
Corpus-based terminology can be described as a working method which consists in exploring a domain-s...
Corpus-based terminology can be described as a working method which consists in exploring a domain-s...
The paper compares systematically the utility of specially-made text corpora and the textual resourc...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenie...
This paper describes preliminary work in corpus-based indexing of a sizeable specialized Web portal,...
Encyclopedias, which describe general/technical terms, are valuable language resources (LRs). As wit...
The Web is an inexhaustible reservoir of machine-readable texts in most of the world’s written langu...
Over the last decade, methods of web corpus construction and the evaluation of web corpora have been...
We present a web service for quickly producing corpora for specialist areas, in any of a range of la...
Since the machines become more and more intelligent, it is reasonable to expect the automatic constr...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
This paper addresses the problem of categorizing terms or lexical entities into a predefined set of ...
International audienceThe conventional tools of the "web as corpus" framework rely heavily on URLs o...
Abstract. In corpus-based lexicography and natural language processing fields some authors have prop...
Corpus-based terminology can be described as a working method which consists in exploring a domain-s...
Corpus-based terminology can be described as a working method which consists in exploring a domain-s...
The paper compares systematically the utility of specially-made text corpora and the textual resourc...