This paper describes our efforts to build a multilingual heritage corpus of alpine texts. Currently we digitize the yearbooks of the Swiss Alpine Club which contain articles in French, German, Italian and Romansch. Articles comprise mountaineering reports from all corners of the earth, but also scientific topics such as topography, geology or glacierology as well as occasional poetry and lyrics. We have already scanned close to 70,000 pages which has resulted in a corpus of 25 million words, 10% of which is a parallel French-German corpus. We have solved a number of challenges in automatic language identification and text structure recognition. Our next goal is to identify the great variety of toponyms (e.g. names of mountains and valleys, ...
A digital corpus on variation in German (1800-1950) The German Innsbruck Corpus (GermInnC) 1800-195...
The present paper shows results of a study on two historically related but geographically separated ...
The SWISS TEXT CORPUS (CHTK) has made it its goal to extensively document the German language of the...
This paper introduces our approach towards annotating a large heritage corpus, which spans over 100 ...
This paper describes experiments in detecting and annotating code-switching in a large multilingual ...
Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in e...
Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-...
Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in e...
Although Swiss dialects of German are widely used in everyday communication, automatic processing of...
As is well-known, the Alps are a zone of long-standing, intensive contact and multilingualism among ...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
In this paper, we report on recent digitization efforts of the linguistic atlas of German-speaking S...
In this paper we describe our efforts in reducing and correcting OCR errors in the context of buildi...
In this paper, we present the findings of the Shared Task on Swiss German Language Ide...
Abstract: In this paper, we present a corpus for heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS...
A digital corpus on variation in German (1800-1950) The German Innsbruck Corpus (GermInnC) 1800-195...
The present paper shows results of a study on two historically related but geographically separated ...
The SWISS TEXT CORPUS (CHTK) has made it its goal to extensively document the German language of the...
This paper introduces our approach towards annotating a large heritage corpus, which spans over 100 ...
This paper describes experiments in detecting and annotating code-switching in a large multilingual ...
Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in e...
Geotagging historic and cultural texts provides valuable access to heritage data, enabling location-...
Swiss dialects of German are, unlike many dialects of other standardised languages, widely used in e...
Although Swiss dialects of German are widely used in everyday communication, automatic processing of...
As is well-known, the Alps are a zone of long-standing, intensive contact and multilingualism among ...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
In this paper, we report on recent digitization efforts of the linguistic atlas of German-speaking S...
In this paper we describe our efforts in reducing and correcting OCR errors in the context of buildi...
In this paper, we present the findings of the Shared Task on Swiss German Language Ide...
Abstract: In this paper, we present a corpus for heritage Bosnian/Croatian/Montenegrin/Serbian (BCMS...
A digital corpus on variation in German (1800-1950) The German Innsbruck Corpus (GermInnC) 1800-195...
The present paper shows results of a study on two historically related but geographically separated ...
The SWISS TEXT CORPUS (CHTK) has made it its goal to extensively document the German language of the...