We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a character n-gram approach to dialect identification, our model is more robust to individual spelling differences, which are frequently encountered in non-standardized dialect writing. Moreover, it covers the whole Swiss German dialect continuum, which trained models struggle to achieve due to sparsity of training data
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
International audienceMost work in natural language processing is geared towards written, standardiz...
International audienceMost work in natural language processing is geared towards written, standardiz...
This thesis proposes to combine methods and data from two rather distant fields of language science ...
This thesis proposes to combine methods and data from two rather distant fields of language science ...
Most Natural Language Processing (NLP) applications focus on standardized, written language varietie...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
Most machine translation systems apply to written, standardized language varieties. In contrast, we ...
Most work in natural language processing is geared towards written, standardized language varieties....
Although there is a good availability of Swiss German dialect data, very few works have looked at th...
This paper proposes a simple metric of dialect distance, based on the ratio between identical word p...
In the last decades, dialectometry has emerged as a new field of dialectology. As this kind of resea...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
International audienceMost work in natural language processing is geared towards written, standardiz...
International audienceMost work in natural language processing is geared towards written, standardiz...
This thesis proposes to combine methods and data from two rather distant fields of language science ...
This thesis proposes to combine methods and data from two rather distant fields of language science ...
Most Natural Language Processing (NLP) applications focus on standardized, written language varietie...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
Most machine translation systems apply to written, standardized language varieties. In contrast, we ...
Most work in natural language processing is geared towards written, standardized language varieties....
Although there is a good availability of Swiss German dialect data, very few works have looked at th...
This paper proposes a simple metric of dialect distance, based on the ratio between identical word p...
In the last decades, dialectometry has emerged as a new field of dialectology. As this kind of resea...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
In traditional dialect geography, dialect divisions are based on individual linguistic features. How...
International audienceMost work in natural language processing is geared towards written, standardiz...
International audienceMost work in natural language processing is geared towards written, standardiz...