We report an ongoing study on quantitative characteristics of texts written in different genres. At this stage, we compared Lithuanian and English texts in terms of genres. We used 16 indices which describe frequency structure of text as well as indicate several other characteristics of written texts. Initial study showed significant differences of indices calculated for genre pairs of the same language. Hierarchical clustering revealed possible applications in using them as features for text categorization/classification by genre, though better results were achieved for Lithuanian textsBaltijos pažangių technologijų institutas, VilniusBaltijos pažangiųjų technologijų institutasTaikomosios informatikos katedraVilniaus universitetasVytauto D...
This article aims to answer two questions: 1) whether the cluster analysis handy finding rhythm typo...
A simple method for categorizing texts into pre-determined text genre categories using the statistic...
The present paper aims to show how a cross-linguistic analysis based on a parallel corpus can be use...
ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)We report an ongoing study on statistical character...
We report an ongoing study on statistical characteristics of texts written in different genres. It h...
Darbe nagrinėjami skirtingų autorių ir skirtingų žanrų tekstai, parašyti lietuvių kalba. Pagrindinės...
Lietuvių kalba yra gana sudėtinga ir lanksti, ir tai gerokai apsunkina efektyvių algoritmų kūrimą au...
This paper discusses research on Lithuanian texts of different styles for the development of the met...
The present paper aims to show how a cross-linguistic analysis based on a parallel corpus can be use...
This paper presents an effort to provide a level-appropriate study corpus for Lithuanian language le...
It is important to evaluate specificities of alphabets, particularly the letter frequencies while de...
This paper examines automated genre classification of text documents and its role in enabling the ef...
Knygos https://doi.org/10.1075/btl.140The purpose of this article is to compare two translations int...
This thesis is concerned with text typology. In this thesis, the written part of the British Nationa...
Abstract: Our research uses the analysis of a Latin historical corpus to study the indicators struct...
This article aims to answer two questions: 1) whether the cluster analysis handy finding rhythm typo...
A simple method for categorizing texts into pre-determined text genre categories using the statistic...
The present paper aims to show how a cross-linguistic analysis based on a parallel corpus can be use...
ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online)We report an ongoing study on statistical character...
We report an ongoing study on statistical characteristics of texts written in different genres. It h...
Darbe nagrinėjami skirtingų autorių ir skirtingų žanrų tekstai, parašyti lietuvių kalba. Pagrindinės...
Lietuvių kalba yra gana sudėtinga ir lanksti, ir tai gerokai apsunkina efektyvių algoritmų kūrimą au...
This paper discusses research on Lithuanian texts of different styles for the development of the met...
The present paper aims to show how a cross-linguistic analysis based on a parallel corpus can be use...
This paper presents an effort to provide a level-appropriate study corpus for Lithuanian language le...
It is important to evaluate specificities of alphabets, particularly the letter frequencies while de...
This paper examines automated genre classification of text documents and its role in enabling the ef...
Knygos https://doi.org/10.1075/btl.140The purpose of this article is to compare two translations int...
This thesis is concerned with text typology. In this thesis, the written part of the British Nationa...
Abstract: Our research uses the analysis of a Latin historical corpus to study the indicators struct...
This article aims to answer two questions: 1) whether the cluster analysis handy finding rhythm typo...
A simple method for categorizing texts into pre-determined text genre categories using the statistic...
The present paper aims to show how a cross-linguistic analysis based on a parallel corpus can be use...