Presented in this thesis is a study of the effect of OCR errors on short documents. OCR recognizes and translates text image into ASCII format. When this data is retrieved in response to a query, the retrieval performance depends on the efficiency of the OCR device used. Measures like recall, precision and ranking were used to gauge the retrieval performance. The information retrieval system that was used is SMART, based on the vector space model. On evaluating these measures, it has been concluded that average precision and recall are not affected significantly when the OCR collection is compared to its corrected version. However, it was also concluded that with more complex weighting schemes, the relevant document rankings became more div...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
Systems that predict optical character recognition (OCR) accuracy of an input image by a given OCR s...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although som...
We give a comprehensive report on our experiments with retrieval from OCR-generated text using syste...
Important legacy paper documents are digitized and collected in online accessible archives. This ena...
We report on the performance of the vector space model in the presence of OCR errors. We show that a...
Optical Character Recognition (OCR) Post Processing involves data cleaning steps for documents that ...
This thesis reports on the effects of an automatic query expansion with a subject specific thesaurus...
Digitized document collections often suffer from OCR errors that may impact a document's readability...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
Important legacy paper documents are digitized and collected in online accessible archives. This ena...
This thesis discusses the design and implementation of an OCR post processing system. The system is ...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...
Abstract We give a comprehensive report on our experiments with retrieval from OCR generated text us...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
Systems that predict optical character recognition (OCR) accuracy of an input image by a given OCR s...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although som...
We give a comprehensive report on our experiments with retrieval from OCR-generated text using syste...
Important legacy paper documents are digitized and collected in online accessible archives. This ena...
We report on the performance of the vector space model in the presence of OCR errors. We show that a...
Optical Character Recognition (OCR) Post Processing involves data cleaning steps for documents that ...
This thesis reports on the effects of an automatic query expansion with a subject specific thesaurus...
Digitized document collections often suffer from OCR errors that may impact a document's readability...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
Important legacy paper documents are digitized and collected in online accessible archives. This ena...
This thesis discusses the design and implementation of an OCR post processing system. The system is ...
htmlabstractHumanities scholars increasingly rely on digital archives for their research in place of...
Abstract We give a comprehensive report on our experiments with retrieval from OCR generated text us...
Humanities scholars increasingly rely on digital archives for their research instead of time-consumi...
Systems that predict optical character recognition (OCR) accuracy of an input image by a given OCR s...
In this thesis we describe a spelling correction system designed specifically for OCR (Optical Chara...