We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in partic...
This paper describes methods and results for the annotation of two discourse-level phenomena, connec...
This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with thr...
This chapter describes the steps involved in the construction of EPTIC, an intermodal corpus of Euro...
We discovered several recurring errors in the current version of the Europarl Corpus originating bot...
The freely available European Parliament Proceedings Parallel Corpus, or Europarl, is one of the lar...
The Europarl corpus, short for „European Parliament Proceedings Parallel Corpus 1996-2011“, is provi...
This dataset contains comparable translational corpora extracted from the European Parliament Procee...
Europarl is a large multilingual corpus containing the minutes of the debates at the European Parlia...
The European Union institutions are increasingly present in the lives of European citizens, particul...
This dataset contains directional parallel corpora extracted from the European Parliament Proceeding...
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-6...
ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, ...
This paper presents an analysis of disfluencies in the European Parliament Interpreting Corpus (EPIC...
ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, ...
We are presenting a new highly multilingual document-aligned parallel corpus called DCEP- Digital Co...
This paper describes methods and results for the annotation of two discourse-level phenomena, connec...
This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with thr...
This chapter describes the steps involved in the construction of EPTIC, an intermodal corpus of Euro...
We discovered several recurring errors in the current version of the Europarl Corpus originating bot...
The freely available European Parliament Proceedings Parallel Corpus, or Europarl, is one of the lar...
The Europarl corpus, short for „European Parliament Proceedings Parallel Corpus 1996-2011“, is provi...
This dataset contains comparable translational corpora extracted from the European Parliament Procee...
Europarl is a large multilingual corpus containing the minutes of the debates at the European Parlia...
The European Union institutions are increasingly present in the lives of European citizens, particul...
This dataset contains directional parallel corpora extracted from the European Parliament Proceeding...
This corpora is part of Deliverable 5.5 of the European Commission project QTLeap FP7-ICT-2013.4.1-6...
ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, ...
This paper presents an analysis of disfluencies in the European Parliament Interpreting Corpus (EPIC...
ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, ...
We are presenting a new highly multilingual document-aligned parallel corpus called DCEP- Digital Co...
This paper describes methods and results for the annotation of two discourse-level phenomena, connec...
This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with thr...
This chapter describes the steps involved in the construction of EPTIC, an intermodal corpus of Euro...