Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenient because their creation is fast and inexpensive. Several studies have been carried out to assess the representativeness of general-purpose web corpora by comparing them to traditional corpora. Less attention has been paid to assess the representativeness of specialized or domain-specific web corpora. In this paper, we focus on the assessment of domain representativeness of web corpora and we claim that it is possible to assess the degree of domain-specificity, or domainhood, of web corpora. We present a case study where we explore the effectiveness of different measures - namely the Mann-Withney-Wilcoxon Test, Kendall correlation coefficien...
In this work, we show that the difference in performance of embeddings from differently sourced data...
Abstract. In corpus-based lexicography and natural language processing fields some authors have prop...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenie...
In this paper we describe an approach to profile the domain specificity of specialized web corpora i...
In this paper we present our research concerning the relation between two properties of websites and...
The role of the Web for text corpus construction is becoming increasingly significant. However, the ...
In this study we consider the problem of determining whether an English corpus constructed from a gi...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
Over the last decade, methods of web corpus construction and the evaluation of web corpora have been...
Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparis...
The Web is a very rich source of linguistic data, and in the last few years it has been used very in...
The quality of statistical measurements on corpora is strongly related to a strict definition of the...
This paper discusses an investigation into the Norwegian NoWaC corpus. We have com-pared this web co...
Abstract. The 60-year-old dream of computational linguistics is to make computers capable of communi...
In this work, we show that the difference in performance of embeddings from differently sourced data...
Abstract. In corpus-based lexicography and natural language processing fields some authors have prop...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
Web corpora are a cornerstone of modern Language Technology. Corpora built from the web are convenie...
In this paper we describe an approach to profile the domain specificity of specialized web corpora i...
In this paper we present our research concerning the relation between two properties of websites and...
The role of the Web for text corpus construction is becoming increasingly significant. However, the ...
In this study we consider the problem of determining whether an English corpus constructed from a gi...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
Over the last decade, methods of web corpus construction and the evaluation of web corpora have been...
Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparis...
The Web is a very rich source of linguistic data, and in the last few years it has been used very in...
The quality of statistical measurements on corpora is strongly related to a strict definition of the...
This paper discusses an investigation into the Norwegian NoWaC corpus. We have com-pared this web co...
Abstract. The 60-year-old dream of computational linguistics is to make computers capable of communi...
In this work, we show that the difference in performance of embeddings from differently sourced data...
Abstract. In corpus-based lexicography and natural language processing fields some authors have prop...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...