Abstract We give a comprehensive report on our experiments with retrieval from OCR generated text using systems based on standard models of retrieval. More specifically, we show that for a set of queries, the average precision and recall is not affected by OCR errors across these systems. We also demonstrate that the ranking and feedback methods associated with these models are generally not robust enough to to deal with OCR errors. It is further shown that the OCR errors and garbage strings generated from the mistranslation of graphic objects increase the size of the index by a wide margin
Presented in this thesis is a study of the effect of OCR errors on short documents. OCR recognizes a...
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization...
Digitized document collections often suffer from OCR errors that may impact a document's readability...
We give a comprehensive report on our experiments with retrieval from OCR-generated text using syste...
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although som...
We report on the performance of the vector space model in the presence of OCR errors. We show that a...
Character accuracy of optically recognized text is considered a basic measure for evaluating OCR dev...
The difficulty with information retrieval for OCR documents lies in the fact that OCR documents comp...
We report on the results of our experiments on query evaluation in the presence of noisy data. In pa...
The major problem with retrieval of OCR text is the unpredictable distortion of characters due to re...
Important legacy paper documents are digitized and collected in online accessible archives. This ena...
The difficulty with information retrieval for OCR documents lies in the fact that OCR documents comp...
Important legacy paper documents are digitized and collected in online accessible archives. This ena...
This thesis reports on the effects of an automatic query expansion with a subject specific thesaurus...
In this paper we describe experiments that investigate the effects of OCR errors on text categorizat...
Presented in this thesis is a study of the effect of OCR errors on short documents. OCR recognizes a...
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization...
Digitized document collections often suffer from OCR errors that may impact a document's readability...
We give a comprehensive report on our experiments with retrieval from OCR-generated text using syste...
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although som...
We report on the performance of the vector space model in the presence of OCR errors. We show that a...
Character accuracy of optically recognized text is considered a basic measure for evaluating OCR dev...
The difficulty with information retrieval for OCR documents lies in the fact that OCR documents comp...
We report on the results of our experiments on query evaluation in the presence of noisy data. In pa...
The major problem with retrieval of OCR text is the unpredictable distortion of characters due to re...
Important legacy paper documents are digitized and collected in online accessible archives. This ena...
The difficulty with information retrieval for OCR documents lies in the fact that OCR documents comp...
Important legacy paper documents are digitized and collected in online accessible archives. This ena...
This thesis reports on the effects of an automatic query expansion with a subject specific thesaurus...
In this paper we describe experiments that investigate the effects of OCR errors on text categorizat...
Presented in this thesis is a study of the effect of OCR errors on short documents. OCR recognizes a...
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization...
Digitized document collections often suffer from OCR errors that may impact a document's readability...