none4Abstract In this paper we introduce ukWaC, a large corpus of English constructed by crawling the .uk Internet domain. The corpus contains more than 2 billion tokens and is one of the largest freely available linguistic resources for English. The paper describes the tools and methodology used in the construction of the corpus and provides a qualitative evaluation of its contents, carried out through a vocabulary based comparison with the BNC. We conclude by giving practical information about availability and format of the corpus.noneA. Ferraresi; E. Zanchetta; M. Baroni; S. BernardiniA. Ferraresi; E. Zanchetta; M. Baroni; S. Bernardin
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
This paper reports on the construction of the Cambridge and Nottingham e-language Corpus (CANELC).Th...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
This book presents a richly illustrated, hands-on discussion of one of the fastest growing fields in...
This book presents a richly illustrated, hands-on discussion of one of the fastest growing fields in...
This book presents a richly illustrated, hands-on discussion of one of the fastest growing fields in...
This paper is a brief decription of the British National Corpus (BNC) project, which is a collaborat...
The American National Corpus (ANC) project is developing a corpus comparable to the British National...
As a result of the European Union’s pressure towards internationalization, universities in many coun...
none2As a result of the European Union’s pressure towards internationalization, universities in many...
As a result of the European Union’s pressure towards internationalization, universities in many coun...
From the beginning of the twentieth century on, the use of the World Wide Web has become a current t...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
This paper reports on the construction of the Cambridge and Nottingham e-language Corpus (CANELC).Th...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
This article introduces ukWaC, deWaC and itWaC, three very large corpora of English, German, and Ita...
This book presents a richly illustrated, hands-on discussion of one of the fastest growing fields in...
This book presents a richly illustrated, hands-on discussion of one of the fastest growing fields in...
This book presents a richly illustrated, hands-on discussion of one of the fastest growing fields in...
This paper is a brief decription of the British National Corpus (BNC) project, which is a collaborat...
The American National Corpus (ANC) project is developing a corpus comparable to the British National...
As a result of the European Union’s pressure towards internationalization, universities in many coun...
none2As a result of the European Union’s pressure towards internationalization, universities in many...
As a result of the European Union’s pressure towards internationalization, universities in many coun...
From the beginning of the twentieth century on, the use of the World Wide Web has become a current t...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
This paper reports on the construction of the Cambridge and Nottingham e-language Corpus (CANELC).Th...