This thesis presents a system for correcting errors from optical character recognition (OCR) software. As a noisy-channel error correction system, it uses a language model to provide a prior over the true text. For this purpose, we introduce a lexicalized version of Klein and Manning’s Dependency Model with Valence, a grammar that is trained without structure annotation. The novel language model provides error correction performance that is slightly better than a 4-gram baseline on a corpus of historical English text. When interpolated with the 4-gram model, a relative reduction in word error rate of 32.1 % is achieved, which is 2.5% more than the 4-gram model alone. However, the improvement is primarily attributable not to the modeling of ...
We consider a model for which it is important, early in proces sing, to estimate some variables with...
One of the major challenges of using historical document collections for research is the fact that O...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...
An essential component of many applications in natural language processing is a language modeler abl...
Optical character recognition (OCR) is a recognition system used to recognize the substance of a che...
In this paper, stochastic error-correcting parsing is proposed as a powerful and flexible method to ...
This paper describes a new expert system for automatically correcting errors made by optical charact...
In this thesis; we report on our experiments on training and categorization of optically recognized ...
Optical Character Recognition (OCR) Post Processing involves data cleaning steps for documents that ...
In this paper, stochastic error-correcting parsing is pro-posed as a powerful and flexible method to...
In this paper, we describe a spelling correction system designed specifically for OCR-generated text...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Post processing is the most conventional approach for correcting errors that are caused by Optical C...
International audienceIn this paper we present a novel approach to the automatic correction of OCR-i...
Understanding handwritten and printed text is easier for humans but computers do not have the same l...
We consider a model for which it is important, early in proces sing, to estimate some variables with...
One of the major challenges of using historical document collections for research is the fact that O...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...
An essential component of many applications in natural language processing is a language modeler abl...
Optical character recognition (OCR) is a recognition system used to recognize the substance of a che...
In this paper, stochastic error-correcting parsing is proposed as a powerful and flexible method to ...
This paper describes a new expert system for automatically correcting errors made by optical charact...
In this thesis; we report on our experiments on training and categorization of optically recognized ...
Optical Character Recognition (OCR) Post Processing involves data cleaning steps for documents that ...
In this paper, stochastic error-correcting parsing is pro-posed as a powerful and flexible method to...
In this paper, we describe a spelling correction system designed specifically for OCR-generated text...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Post processing is the most conventional approach for correcting errors that are caused by Optical C...
International audienceIn this paper we present a novel approach to the automatic correction of OCR-i...
Understanding handwritten and printed text is easier for humans but computers do not have the same l...
We consider a model for which it is important, early in proces sing, to estimate some variables with...
One of the major challenges of using historical document collections for research is the fact that O...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...