In this paper we describe experiments that investigate the effects of OCR errors on text categorization. In particular, we show that in our environment, OCR errors have no effect on categorization when we use a classifier based on the naive Bayes model. We also observe that dimensionality reduction techniques eliminate a large number of OCR errors and improve categorization results
International audienceToday, there is an increasing demand of efficient archival and retrieval metho...
Historical documents pose a challenge for character recognition due to various reasons such as font ...
Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five amo...
In this thesis; we report on our experiments on training and categorization of optically recognized ...
We give a comprehensive report on our experiments with retrieval from OCR-generated text using syste...
We report on the performance of the vector space model in the presence of OCR errors. We show that a...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
Abstract. Current general digitization approach of paper media is converting them into the digital i...
Abstract We give a comprehensive report on our experiments with retrieval from OCR generated text us...
Current general digitization approach of paper media is converting them into the digital images by a...
三重大学大学院工学研究科博士前期課程情報工学専攻Optical Character reader (OCR) systems can be used in digitizing print docum...
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization...
This paper explores in detail the use of Error Correcting Output Coding (ECOC) for learning text c...
This work presents categorization experiments performed over noisy texts. By noisy, we mean any text...
Text categorization is a fundamental task in document processing, allowing the automated handling of...
International audienceToday, there is an increasing demand of efficient archival and retrieval metho...
Historical documents pose a challenge for character recognition due to various reasons such as font ...
Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five amo...
In this thesis; we report on our experiments on training and categorization of optically recognized ...
We give a comprehensive report on our experiments with retrieval from OCR-generated text using syste...
We report on the performance of the vector space model in the presence of OCR errors. We show that a...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
Abstract. Current general digitization approach of paper media is converting them into the digital i...
Abstract We give a comprehensive report on our experiments with retrieval from OCR generated text us...
Current general digitization approach of paper media is converting them into the digital images by a...
三重大学大学院工学研究科博士前期課程情報工学専攻Optical Character reader (OCR) systems can be used in digitizing print docum...
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization...
This paper explores in detail the use of Error Correcting Output Coding (ECOC) for learning text c...
This work presents categorization experiments performed over noisy texts. By noisy, we mean any text...
Text categorization is a fundamental task in document processing, allowing the automated handling of...
International audienceToday, there is an increasing demand of efficient archival and retrieval metho...
Historical documents pose a challenge for character recognition due to various reasons such as font ...
Naïve Bayes, k-nearest neighbors, Adaboost, support vector machines and neural networks are five amo...