Teksti sõnaliikideks jaotamine sündis koos lingvistikaga, kuid selle protsessi automatiseerimine on muutunud võimalikuks alles viimastel kümnenditel ning seda tänu arvutite võimsuse kasvule. Tekstitöötluse algoritmid on alates sellest ajast iga aastaga üha paranenud. Selle magistritöö raames pannakse üks selle valdkonna lipulaevadest proovile korpuse peal, mis hõlmab eesti keelt emakeelena kõnelevate inglise keele õppijate tekste (TCELE korpus). Korpuse suurus on antud hetkel ca. 25 000 sõna (127 kirjalikku esseed) ning 11 transkribeeritud intervjuud (~100 minutit). Eesmärk on hinnata TCELE ja muude sarnaste korpuste veaprotsenti. Töö esimeses osas tutvustatakse lugejal...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
In this study a simple method for automatic correction of part-ofspeech corpora is presented, which ...
This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conv...
The use of a corpus as a language resource is enhanced when it is part of speech (POS) tagged. There...
Erilaiset kieliteknologiasovellukset ovat olleet jo vuosikymmeniä arkipäiväises-sä käytössä. Esimerk...
Proceedings of the NODALIDA 2009 workshop Constraint Grammar and robust parsing. Editors: Eckhard ...
The article presents the possibilities for recognizing word order errors in Estonian, the methods us...
This paper is concerned with the application of technologies developed in other disciplines, in par...
EMMA1, the Estonian language learners’ text corpus being developed at the Institute of Estonian and ...
This paper presents a short description of work recently done at University of Tartu to construct a ...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
Most of corpus-based studies of learner language have been completed in the framework of error analy...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
This paper reports on approaches for automatically predicting a learner’s language proficiency in Es...
Statistical n-gram taggers like that of [Church 1988] or [Foster 1991] assign a part-of-speech label...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
In this study a simple method for automatic correction of part-ofspeech corpora is presented, which ...
This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conv...
The use of a corpus as a language resource is enhanced when it is part of speech (POS) tagged. There...
Erilaiset kieliteknologiasovellukset ovat olleet jo vuosikymmeniä arkipäiväises-sä käytössä. Esimerk...
Proceedings of the NODALIDA 2009 workshop Constraint Grammar and robust parsing. Editors: Eckhard ...
The article presents the possibilities for recognizing word order errors in Estonian, the methods us...
This paper is concerned with the application of technologies developed in other disciplines, in par...
EMMA1, the Estonian language learners’ text corpus being developed at the Institute of Estonian and ...
This paper presents a short description of work recently done at University of Tartu to construct a ...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
Most of corpus-based studies of learner language have been completed in the framework of error analy...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
This paper reports on approaches for automatically predicting a learner’s language proficiency in Es...
Statistical n-gram taggers like that of [Church 1988] or [Foster 1991] assign a part-of-speech label...
Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Jo...
In this study a simple method for automatic correction of part-ofspeech corpora is presented, which ...
This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conv...