The topic of this thesis is the post-correction of Icelandic OCR (optical character recognized) text. Two methods for spelling correction of OCR errors in Icelandic text are proposed and evaluated on misrecognized words in a digitization project which is ongoing in Alþingi (the Icelandic parliament). The first method is based on a noisy channel model. This method is applied to nonword errors, i.e., words which have been misrecognized during the OCR process and transformed into another word which is not in the Icelandic vocabulary. This method achieves a correction accuracy of 92.9% when applied to a test set of nonword errors from a large collection of digitized parliamentary speeches from the Alþingi digitization project (a total of 47 mil...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...
Understanding handwritten and printed text is easier for humans but computers do not have the same l...
We present an approach for automatic detection and correction of OCR-induced misspellings in histori...
Í þessari ritgerð verður fjallað um samhengisháða sjálfvirka stafsetningarleiðréttingu, smíði og útg...
Optical character recognition (Optical Character Recognition (OCR)) has many applications, such as...
This paper presents experiments on Optical character recognition (OCR) as a combination of Ocropy so...
The purpose of this paper is to compare two basic post-processing algorithms for correction of optic...
Tölvunarfræði, ThesisIn this thesis, four attempts to improve the tagging accuracy for Icelandic tex...
In this paper we describe our efforts in reducing and correcting OCR errors in the context of buildi...
This master’s thesis project undertook the investigation of whether spelling correction would improv...
This thesis discusses the design and implementation of an OCR post processing system. The system is ...
Þróun tækni gegnir sífellt mikilvægara hlutverki í lífi okkar allra og mikil aukning er á því að not...
The optical character recognition (OCR) quality of the historical part of the Finnish newspaper and ...
In this article, we study correction of spelling errors, specifically on how the spelling errors are...
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kri...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...
Understanding handwritten and printed text is easier for humans but computers do not have the same l...
We present an approach for automatic detection and correction of OCR-induced misspellings in histori...
Í þessari ritgerð verður fjallað um samhengisháða sjálfvirka stafsetningarleiðréttingu, smíði og útg...
Optical character recognition (Optical Character Recognition (OCR)) has many applications, such as...
This paper presents experiments on Optical character recognition (OCR) as a combination of Ocropy so...
The purpose of this paper is to compare two basic post-processing algorithms for correction of optic...
Tölvunarfræði, ThesisIn this thesis, four attempts to improve the tagging accuracy for Icelandic tex...
In this paper we describe our efforts in reducing and correcting OCR errors in the context of buildi...
This master’s thesis project undertook the investigation of whether spelling correction would improv...
This thesis discusses the design and implementation of an OCR post processing system. The system is ...
Þróun tækni gegnir sífellt mikilvægara hlutverki í lífi okkar allra og mikil aukning er á því að not...
The optical character recognition (OCR) quality of the historical part of the Finnish newspaper and ...
In this article, we study correction of spelling errors, specifically on how the spelling errors are...
Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. Editors: Kri...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...
Understanding handwritten and printed text is easier for humans but computers do not have the same l...
We present an approach for automatic detection and correction of OCR-induced misspellings in histori...