Data-driven segmentation of words into subword units has been used in various natural language processing applications such as automatic speech recognition and statistical machine translation for almost 20 years. Recently it has became more widely adopted, as models based on deep neural networks often benefit from subword units even for morphologically simpler languages. In this paper, we discuss and compare training algorithms for a unigram subword model, based on the Expectation Maximization algorithm and lexicon pruning. Using English, Finnish, North Sami, and Turkish data sets, we show that this approach is able to find better solutions to the optimization problem defined by the Morfessor Baseline model than its original recursive train...
This paper presents an algorithm for the unsuper-vised learning of a simple morphology of a nat-ural...
Hiljuti kasutusele võetud neuromasintõlge koos sõnaosade segmenteerimisega on saavutanud masintõlke ...
Many Uralic languages have a rich morphological structure, but lack morphological analysis tools nee...
| openaire: EC/H2020/780069/EU//MeMADData-driven segmentation of words into subword units has been u...
In our submission to the SIGMORPHON 2022 Shared Task on Morpheme Segmentation, we study whether an u...
In this work, Morfessor, a morpheme segmentation model and algorithm developed by the organizers of ...
Subword segmenters like BPE operate as a pre-processing step in neural machine translation and othe...
Subwords have become the standard units of text in NLP, enabling efficient open-vocabulary models. W...
Morfessor is a family of methods for learning morphological segmentations of words based on unannota...
We present two methods for unsupervised segmentation of words into morpheme-like units. The model ut...
We present two methods for unsupervised segmentation of words into morpheme-like units. The model ...
The state of the art of handling rich morphology in neural machine translation (NMT) is to break wor...
Machine learning methods are increasingly applied to automated processing of natural language data. ...
Determining optimal units of representing morphologically complex words in the mental lexicon is a c...
Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis ...
This paper presents an algorithm for the unsuper-vised learning of a simple morphology of a nat-ural...
Hiljuti kasutusele võetud neuromasintõlge koos sõnaosade segmenteerimisega on saavutanud masintõlke ...
Many Uralic languages have a rich morphological structure, but lack morphological analysis tools nee...
| openaire: EC/H2020/780069/EU//MeMADData-driven segmentation of words into subword units has been u...
In our submission to the SIGMORPHON 2022 Shared Task on Morpheme Segmentation, we study whether an u...
In this work, Morfessor, a morpheme segmentation model and algorithm developed by the organizers of ...
Subword segmenters like BPE operate as a pre-processing step in neural machine translation and othe...
Subwords have become the standard units of text in NLP, enabling efficient open-vocabulary models. W...
Morfessor is a family of methods for learning morphological segmentations of words based on unannota...
We present two methods for unsupervised segmentation of words into morpheme-like units. The model ut...
We present two methods for unsupervised segmentation of words into morpheme-like units. The model ...
The state of the art of handling rich morphology in neural machine translation (NMT) is to break wor...
Machine learning methods are increasingly applied to automated processing of natural language data. ...
Determining optimal units of representing morphologically complex words in the mental lexicon is a c...
Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis ...
This paper presents an algorithm for the unsuper-vised learning of a simple morphology of a nat-ural...
Hiljuti kasutusele võetud neuromasintõlge koos sõnaosade segmenteerimisega on saavutanud masintõlke ...
Many Uralic languages have a rich morphological structure, but lack morphological analysis tools nee...