What's Changed smaller language data footprint with smallest possible impact on performance, using a combination of rules, upper limit on word length, and better data cleaning (#31) unsupervised approach to affixes activated by default for some languages reviewed rules for English and German (less greedy) added rules for Dutch, Finnish, Polish and Russian improved Russian and Ukrainian language data (#3) improved tokenizer Full Changelog: https://github.com/adbar/simplemma/compare/v0.9.0...v0.9.1If you use this software, please cite it using these metadata
Although automatic syllabification is an important component in several natural language tasks, litt...
Machine translation translates a text from one language to another, while text simplification conver...
Spoken data from language-contact situations is extremely varied. This heterogeneity makes it diffic...
better rules for English and German inconsistencies fixed for cy, de, en, ga, sv (#16) docs: added l...
new languages: Armenian, Greek, Macedonian, Norwegian (Bokmål), and Polish language data reviewed fo...
rework done on compounding in MBMA. (still work in progress) lots of improvement in MBMA rule handli...
Many downstream applications are using dependency trees, and are thus relying on dependencyparsers p...
The article at hand aggregates the work of our group in automatic processing of simplified German. W...
In this paper, we introduce and demonstrate the online demo as well as the command line interface of...
In this paper we investigate the technique of extending the Moses Statistical Machine Translation (S...
improved models and disambiguation improved tokenization extended rules for Germa
We investigate whether non-configurational languages, which display more word order variation than c...
Changes: bump OM to v1.8.2 add statistics for language usage some small fixe
Most natural language applications have some degree of preprocessing of data: tokenisation, stemming...
The report demonstrates how English can be normalized to make it more suitable as interlingua in mac...
Although automatic syllabification is an important component in several natural language tasks, litt...
Machine translation translates a text from one language to another, while text simplification conver...
Spoken data from language-contact situations is extremely varied. This heterogeneity makes it diffic...
better rules for English and German inconsistencies fixed for cy, de, en, ga, sv (#16) docs: added l...
new languages: Armenian, Greek, Macedonian, Norwegian (Bokmål), and Polish language data reviewed fo...
rework done on compounding in MBMA. (still work in progress) lots of improvement in MBMA rule handli...
Many downstream applications are using dependency trees, and are thus relying on dependencyparsers p...
The article at hand aggregates the work of our group in automatic processing of simplified German. W...
In this paper, we introduce and demonstrate the online demo as well as the command line interface of...
In this paper we investigate the technique of extending the Moses Statistical Machine Translation (S...
improved models and disambiguation improved tokenization extended rules for Germa
We investigate whether non-configurational languages, which display more word order variation than c...
Changes: bump OM to v1.8.2 add statistics for language usage some small fixe
Most natural language applications have some degree of preprocessing of data: tokenisation, stemming...
The report demonstrates how English can be normalized to make it more suitable as interlingua in mac...
Although automatic syllabification is an important component in several natural language tasks, litt...
Machine translation translates a text from one language to another, while text simplification conver...
Spoken data from language-contact situations is extremely varied. This heterogeneity makes it diffic...