This paper describes our research on statistical language modeling of Lithuanian. The idea of improving sparse n-gram models of highly inflected Lithuanian language by interpolating them with complex n-gram models based on word clustering and morphological word decomposition was investigated. Words, word base forms and part-of-speech tags were clustered into 50 to 5000 automatically generated classes. Multiple 3-gram and 4-gram class-based language models were built and evaluated on Lithuanian text corpus, which contained 85 million words. Class-based models linearly interpolated with the 3-gram model led up to a 13% reduction in the perplexity compared with the baseline 3-gram model. Morphological models decreased out-of-vocabulary word ra...
We describe an approach for morphological analysis combining a rule-based word level morphological a...
Straipsnis skirtas supažindinti su viena pirmųjų Lietuvoje sukurtų kalbotyros duomenų bazių ir aptar...
Straipsnyje pristatomi didelio lietuvių kalbos tekstyno automatinio morfologinio vienareikšminimo ty...
Abstract. This paper describes our research on statistical language modeling of Lithuanian. The idea...
This paper presents state of the art language modeling (LM) of Lithuanian, which is highly inflected...
As the development of information technologies makes progress, large morphologically annotated corpo...
Abstract. This paper investigates a variety of statistical cache-based language models built upon th...
This paper deals with the usage of parts of speech and their grammatical features in the morphologic...
This paper investigates a variety of statistical cache-based language models built upon three corpor...
We present the first statistical dependency parsing results for Lithuanian, a morphologically rich l...
The paper overviews the process of compilation of the first corpus-based Dictionary of Lithuanian Ph...
The paper deals with the preliminary findings from the morphologically annotated corpus of Lithuania...
Šiame straipsnyje pristatomas lietuvių kalbos statistinio modeliavimo žanrų trigramų mišiniu tyrimas...
The article presents a brief overview of studies in the field of computational morphology in Latvian...
Darbe nagrinėjami skirtingų autorių ir skirtingų žanrų tekstai, parašyti lietuvių kalba. Pagrindinės...
We describe an approach for morphological analysis combining a rule-based word level morphological a...
Straipsnis skirtas supažindinti su viena pirmųjų Lietuvoje sukurtų kalbotyros duomenų bazių ir aptar...
Straipsnyje pristatomi didelio lietuvių kalbos tekstyno automatinio morfologinio vienareikšminimo ty...
Abstract. This paper describes our research on statistical language modeling of Lithuanian. The idea...
This paper presents state of the art language modeling (LM) of Lithuanian, which is highly inflected...
As the development of information technologies makes progress, large morphologically annotated corpo...
Abstract. This paper investigates a variety of statistical cache-based language models built upon th...
This paper deals with the usage of parts of speech and their grammatical features in the morphologic...
This paper investigates a variety of statistical cache-based language models built upon three corpor...
We present the first statistical dependency parsing results for Lithuanian, a morphologically rich l...
The paper overviews the process of compilation of the first corpus-based Dictionary of Lithuanian Ph...
The paper deals with the preliminary findings from the morphologically annotated corpus of Lithuania...
Šiame straipsnyje pristatomas lietuvių kalbos statistinio modeliavimo žanrų trigramų mišiniu tyrimas...
The article presents a brief overview of studies in the field of computational morphology in Latvian...
Darbe nagrinėjami skirtingų autorių ir skirtingų žanrų tekstai, parašyti lietuvių kalba. Pagrindinės...
We describe an approach for morphological analysis combining a rule-based word level morphological a...
Straipsnis skirtas supažindinti su viena pirmųjų Lietuvoje sukurtų kalbotyros duomenų bazių ir aptar...
Straipsnyje pristatomi didelio lietuvių kalbos tekstyno automatinio morfologinio vienareikšminimo ty...