Current general digitization approach of paper media is converting them into the digital images by a scanner, and then reading them by an OCR to generate ASCII text for full-text retrieval. However, it is impossible to recognize all characters with 100% accuracy by the present OCR technology. Therefore, it is important to know the impact of OCR accuracy on automatic text classification to reveal its technical feasibility. In this research we perform automatic text classification experiments for English newswire articles to study on the relationships between the accuracies of OCR and the text classification employing the statistical classification techniques
Digital humanities research that requires the digitization of medium-scale, project-specific texts c...
Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
Abstract. Current general digitization approach of paper media is converting them into the digital i...
Current general digitization approach of paper media is converting them into the digital images by a...
三重大学大学院工学研究科博士前期課程情報工学専攻Optical Character reader (OCR) systems can be used in digitizing print docum...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although som...
In literature, many feature types and learning algorithms are proposed for document classification. ...
The aim of the OCR Index project was to investigate the feasibility of optical character recognition...
Digital documents are easy to handle, share and store than hard copy of documents. These made people...
In this paper we describe experiments that investigate the effects of OCR errors on text categorizat...
Effects of Optical Character Recognition (OCR) quality on historical information retrieval have so f...
International audienceOptical character recognition (OCR) is one of the most popular techniques used...
Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text...
Digital humanities research that requires the digitization of medium-scale, project-specific texts c...
Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...
Abstract. Current general digitization approach of paper media is converting them into the digital i...
Current general digitization approach of paper media is converting them into the digital images by a...
三重大学大学院工学研究科博士前期課程情報工学専攻Optical Character reader (OCR) systems can be used in digitizing print docum...
The user expectation from a digitized collection is that a full text search can be performed and tha...
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although som...
In literature, many feature types and learning algorithms are proposed for document classification. ...
The aim of the OCR Index project was to investigate the feasibility of optical character recognition...
Digital documents are easy to handle, share and store than hard copy of documents. These made people...
In this paper we describe experiments that investigate the effects of OCR errors on text categorizat...
Effects of Optical Character Recognition (OCR) quality on historical information retrieval have so f...
International audienceOptical character recognition (OCR) is one of the most popular techniques used...
Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text...
Digital humanities research that requires the digitization of medium-scale, project-specific texts c...
Optical Character Recognition (OCR) is a technique, used to convert scanned image into editable text...
Post-OCR is an important processing step that follows optical character recognition (OCR) and is mea...