This paper presents a rule-based method for converting between colloquial Finnish and standard Finnish. The method relies upon a small number of orthographical rules combined with a large language model of standard Finnish for ranking the possible conversions. Aside from this contribution, the paper also presents an evaluation corpus consisting of aligned sentences in colloquial Finnish, orthographically-standardised colloquial Finnish and standard Finnish. The method we present outperforms the baseline of simply treating colloquial Finnish as standard Finnish, but is outperformed by a phrase-based MT system trained by the evaluation corpus. The paper also presents preliminary results which show promise for using normalisation in the machin...
Hämäläinen, M., Partanen, N., Alnajjar, K., Rueter J. & Poibeau T. (2020). Automatic Dialect Adaptat...
The thesis presents a formalism for specifying grammars for automatic controlled language translatio...
We describe the methods and resources used to build FinnTreeBank-3, a 76.4 million token corpus of F...
This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Sour...
Finnish and English differ extensively in how they encode the use of words in context. Lexicon forms...
This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Sour...
Text normalization methods have been commonly applied to historical language or user-generated conte...
Rule-based machine translation requires several sets of rules in various phases of procession. The a...
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been dialectologic...
This paper describes the GF Widecoverage MT system submitted to WMT 2015 for translation from Englis...
Machine translation research has progressed in recent years thanks to statistical machine learning m...
Finnish language is peculiar in that it often uses participial phrase constructions instead of when ...
Modern natural language processing tasks such as text simplification or summarization are typically ...
This paper evaluates various character alignment methods on the task of sentence-level standardizati...
As a contribution to the on-going discussions concerning what strategy to use when approaching a new...
Hämäläinen, M., Partanen, N., Alnajjar, K., Rueter J. & Poibeau T. (2020). Automatic Dialect Adaptat...
The thesis presents a formalism for specifying grammars for automatic controlled language translatio...
We describe the methods and resources used to build FinnTreeBank-3, a 76.4 million token corpus of F...
This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Sour...
Finnish and English differ extensively in how they encode the use of words in context. Lexicon forms...
This paper describes the development of Finnish linguistic resources for use in MedSLT, an Open Sour...
Text normalization methods have been commonly applied to historical language or user-generated conte...
Rule-based machine translation requires several sets of rules in various phases of procession. The a...
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been dialectologic...
This paper describes the GF Widecoverage MT system submitted to WMT 2015 for translation from Englis...
Machine translation research has progressed in recent years thanks to statistical machine learning m...
Finnish language is peculiar in that it often uses participial phrase constructions instead of when ...
Modern natural language processing tasks such as text simplification or summarization are typically ...
This paper evaluates various character alignment methods on the task of sentence-level standardizati...
As a contribution to the on-going discussions concerning what strategy to use when approaching a new...
Hämäläinen, M., Partanen, N., Alnajjar, K., Rueter J. & Poibeau T. (2020). Automatic Dialect Adaptat...
The thesis presents a formalism for specifying grammars for automatic controlled language translatio...
We describe the methods and resources used to build FinnTreeBank-3, a 76.4 million token corpus of F...