The word error rate of any optical character recognition system (OCR) is usually substantially below its component or character error rate. This is especially true of Indic lan-guages in which a word consists of many components. Cur-rent OCRs recognize each character or word separately and do not take advantage of document level constraints. We propose a document level OCR which incorporates infor-mation from the entire document to reduce word error rates. Word images are first clustered using a locality sensitive hashing technique. Individual words are then recognized using a (regular) OCR. The OCR outputs of word images in a cluster are then corrected probabilistically by comparing with the OCR outputs of other members of the same cluster...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
The word error rate of any optical character recognition system (OCR) is usually substantially below...
The word error rate of any optical character recognition system (OCR) is usually substantially below...
taohongcsbualoedu hullcsbualoedu OCR is an errorprone process when input images are degraded Most c...
Conventional optical character recognition (OCR) systems operate on individual characters and words,...
This conference paper was presented in the International Conference on Data and Software Engineering...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelo...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
The word error rate of any optical character recognition system (OCR) is usually substantially below...
The word error rate of any optical character recognition system (OCR) is usually substantially below...
taohongcsbualoedu hullcsbualoedu OCR is an errorprone process when input images are degraded Most c...
Conventional optical character recognition (OCR) systems operate on individual characters and words,...
This conference paper was presented in the International Conference on Data and Software Engineering...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelo...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent...