Optical Character Recognition (OCR) can substantially improve the usability of digitized documents. Language modeling using word lists is known to improve OCR quality for English. Formorphologically rich languages, however, even large word lists do not reach high coverage on unseen text. Morphological analyzers offer a more sophisticated approach, which is useful in many language processing applications. is paper investigates language modeling in the open-source OCR engine Tesseract using morphological analyzers. We present experiments on two Uralic languages Finnish and Erzya. According to our experiments, word lists may still be superior to morphological analyzers in OCR even for languages with rich morphology. Our error analysis indicate...
Morphological analyzer is a fundamental tool in Natural Language Processing (NLP) that generates the...
This paper describes an initial set of experiments in data-driven morpholog-ical analysis of Uralic ...
We will demonstrate several morphological analyzers of languages for which morphological analysis is...
Optical Character Recognition (OCR) is a process of converting text from images to a machine-readabl...
The existence of several documents in historical archives which need to be edited and stored in a co...
In this paper, we deal with the problem of document image rectification from image captured by digit...
This article surveys resource-light monolingual approaches to morphological analysis and tagging. Wh...
This article introduces a corpus-based method for improving the process of automatic morphological a...
Original version.In this article is compared the ability of several morphological operators to impro...
We evaluate two common conjectures in error analysis of NLP models: (i) Morphology is predictive of ...
Optical Character Recognition (OCR) is a process of converting text from images to a machine-readabl...
The purpose of our experiment was to get an estimate of the usefulness of an automatic morphological...
International audienceOne of the most sophisticated abilities humans have acquired through evolution...
The development of rich, multi-lingual corpora is essential for enabling new types of large-scale in...
Morphological analysis is an essential component in Natural Language Processing (NLP) applications r...
Morphological analyzer is a fundamental tool in Natural Language Processing (NLP) that generates the...
This paper describes an initial set of experiments in data-driven morpholog-ical analysis of Uralic ...
We will demonstrate several morphological analyzers of languages for which morphological analysis is...
Optical Character Recognition (OCR) is a process of converting text from images to a machine-readabl...
The existence of several documents in historical archives which need to be edited and stored in a co...
In this paper, we deal with the problem of document image rectification from image captured by digit...
This article surveys resource-light monolingual approaches to morphological analysis and tagging. Wh...
This article introduces a corpus-based method for improving the process of automatic morphological a...
Original version.In this article is compared the ability of several morphological operators to impro...
We evaluate two common conjectures in error analysis of NLP models: (i) Morphology is predictive of ...
Optical Character Recognition (OCR) is a process of converting text from images to a machine-readabl...
The purpose of our experiment was to get an estimate of the usefulness of an automatic morphological...
International audienceOne of the most sophisticated abilities humans have acquired through evolution...
The development of rich, multi-lingual corpora is essential for enabling new types of large-scale in...
Morphological analysis is an essential component in Natural Language Processing (NLP) applications r...
Morphological analyzer is a fundamental tool in Natural Language Processing (NLP) that generates the...
This paper describes an initial set of experiments in data-driven morpholog-ical analysis of Uralic ...
We will demonstrate several morphological analyzers of languages for which morphological analysis is...