The bachelor thesis focuses on basic pre-processing (tokenization and segmentation) of Czech texts, mainly for purposes of Czech internet corpus. The texts for this corpus will be automatically obtained from the world wide web, therefore the segmentation is preceeded by character encoding recognition, cleaning and language identification. We performed experiments with two methods of language identification and present their results. The first method is based on comparison of the most frequent n-grams (substrings of length n) extracted from an unknown document and a large Czech corpus. The second one employs a model estimating word probabilities by conditional probabilities of trigrams estimated on the same corpus. For wider usage, we develo...
Tato práce se zabývá problematikou segmentace textu. K tomuto úkonu jsou v ní použity jak tradiční ...
Title: An Implementation of Methods of Structural Analysis of Czech Complex Sentences Author: Jiří D...
The first step of text analysis is tagging word forms with morphological tags. These tags describe t...
Objective of this work is implementing of segmentation analysis method for Czech language including ...
The diploma thesis focuses on unstructured textual data preprocessing in relation to text mining. A ...
This paper deals with automatic sentence boundary detection in spoken Czech using both textual and p...
In our paper, we present main results of the Czech grant project Internet as a Language Corpus, whos...
The aim of this thesis is to analyse register variation among Czech internet texts. The method is ba...
The aim of this thesis is to explore the possibilities of using n-gram language models for spellchec...
This bachelor thesis explores the current methodology and possibilities of text mining and the subse...
This bachelor thesis deals with the issue of text-mining, mostly focused on preprocessing and transf...
Firstly, basic rules of tagging of the Czech language are described as well as problems connected to...
This Master thesis deals with identification of clauses in Czech morphologically annotated sentences...
Processing simple or complex texts (MIME type - application) often requires automatic recognition of...
This thesis is focused on cluster analysis in the field of text mining and its application to real d...
Tato práce se zabývá problematikou segmentace textu. K tomuto úkonu jsou v ní použity jak tradiční ...
Title: An Implementation of Methods of Structural Analysis of Czech Complex Sentences Author: Jiří D...
The first step of text analysis is tagging word forms with morphological tags. These tags describe t...
Objective of this work is implementing of segmentation analysis method for Czech language including ...
The diploma thesis focuses on unstructured textual data preprocessing in relation to text mining. A ...
This paper deals with automatic sentence boundary detection in spoken Czech using both textual and p...
In our paper, we present main results of the Czech grant project Internet as a Language Corpus, whos...
The aim of this thesis is to analyse register variation among Czech internet texts. The method is ba...
The aim of this thesis is to explore the possibilities of using n-gram language models for spellchec...
This bachelor thesis explores the current methodology and possibilities of text mining and the subse...
This bachelor thesis deals with the issue of text-mining, mostly focused on preprocessing and transf...
Firstly, basic rules of tagging of the Czech language are described as well as problems connected to...
This Master thesis deals with identification of clauses in Czech morphologically annotated sentences...
Processing simple or complex texts (MIME type - application) often requires automatic recognition of...
This thesis is focused on cluster analysis in the field of text mining and its application to real d...
Tato práce se zabývá problematikou segmentace textu. K tomuto úkonu jsou v ní použity jak tradiční ...
Title: An Implementation of Methods of Structural Analysis of Czech Complex Sentences Author: Jiří D...
The first step of text analysis is tagging word forms with morphological tags. These tags describe t...