Content: List of the 743 domains, their term vocabularies in 10 languages, and the Wikipedia articles associated to each domain extracted by the best model described in: Cristina España-Bonet, Alberto Barrón-Cedeño and Lluís Màrquez. “Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction.” ArXiv abs/2005.01177 (2020) https://github.com/cristinae/WikiTailor Files Description: commonCats2015.enesdefrcaareuelrooc.tsv Multilingual domains listed one per line, languages are separated by a tab in the order en, es, de, fr, ca, ar, eu, el, ro and oc. For each language we include the pair "ID categoryName" separated by a blank space. [LAN].0.tar.bz A folder per domain for language [LAN] containing th...
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected ...
In this paper we present the mapping between WordNet domains and WordNet topics, and the emergent Wi...
This paper describes SW1, the first version of a semantically annotated snapshot of the EnglishWikip...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
We introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora by crawling...
We introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora by crawling...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
We introduce WikiDoMiner - a tool for automatically generating domain-specific corpora by crawling W...
peer reviewedWe introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora...
peer reviewedWe introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora...
Wikipedia is not only a large encyclopedia, but lately also a source of linguistic data for various ...
We present a simple but effective method of automatically extracting domain-specific terms using Wik...
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected ...
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected ...
In this paper we present the mapping between WordNet domains and WordNet topics, and the emergent Wi...
This paper describes SW1, the first version of a semantically annotated snapshot of the EnglishWikip...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
We introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora by crawling...
We introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora by crawling...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
Multiple approaches to grab comparable data from the Web have been developed up to date. Neverthele...
We introduce WikiDoMiner - a tool for automatically generating domain-specific corpora by crawling W...
peer reviewedWe introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora...
peer reviewedWe introduce WikiDoMiner -- a tool for automatically generating domain-specific corpora...
Wikipedia is not only a large encyclopedia, but lately also a source of linguistic data for various ...
We present a simple but effective method of automatically extracting domain-specific terms using Wik...
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected ...
A set of corpora for 120 languages automatically collected from wikipedia and the web. Collected ...
In this paper we present the mapping between WordNet domains and WordNet topics, and the emergent Wi...
This paper describes SW1, the first version of a semantically annotated snapshot of the EnglishWikip...