We present an efficient and effective approach to train OCR engines using the Aletheia document analysis system. All components required for training are seamlessly integrated into Aletheia: training data preparation, the OCR engine’s training processes themselves, text recognition, and quantitative evaluation of the trained engine. Such a comprehensive training and evaluation system, guided through a GUI, allows for iterative incremental training to achieve best results. The widely used Tesseract OCR engine is used as a case study to demonstrate the efficiency and effectiveness of the proposed approach. Experimental results are presented validating the training approach with two different historical datasets, representative of recent signi...
The creation of a high-quality optical character recognition system (OCR) requires a large amount of...
This master’s thesis describes the work in creating a customised optical character recognition (OCR)...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
We present an efficient and effective approach to train OCR engines using the Aletheia document anal...
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the ...
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the ...
Part 7: Deep Learning - Convolutional ANNInternational audienceThis work aims at data preparation fo...
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the ...
The aim of this report is to compare OCR accuracy of two well known OCR engines: Tesseract 3.0.1 and...
Optical Character Recognition (OCR) is the mechanical or electronic translation of scanned images of...
A method is presented that significantly reduces the character error rates for OCR text obtained fro...
GT4HistOCR contains ground truth for research in Optical Character Recognition (OCR) technology appl...
Previous research has compared the performance of OCR (optical character recognition) engines strict...
The user expectation from a digitized collection is that a full text search can be performed and tha...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
The creation of a high-quality optical character recognition system (OCR) requires a large amount of...
This master’s thesis describes the work in creating a customised optical character recognition (OCR)...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
We present an efficient and effective approach to train OCR engines using the Aletheia document anal...
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the ...
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the ...
Part 7: Deep Learning - Convolutional ANNInternational audienceThis work aims at data preparation fo...
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due to the ...
The aim of this report is to compare OCR accuracy of two well known OCR engines: Tesseract 3.0.1 and...
Optical Character Recognition (OCR) is the mechanical or electronic translation of scanned images of...
A method is presented that significantly reduces the character error rates for OCR text obtained fro...
GT4HistOCR contains ground truth for research in Optical Character Recognition (OCR) technology appl...
Previous research has compared the performance of OCR (optical character recognition) engines strict...
The user expectation from a digitized collection is that a full text search can be performed and tha...
The millions of pages of historical documents that are digitized in libraries are increasingly used ...
The creation of a high-quality optical character recognition system (OCR) requires a large amount of...
This master’s thesis describes the work in creating a customised optical character recognition (OCR)...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...