This paper describes a general approach on how digitised documents may be automatically prepared for being stored and processed on various digital platforms. The focus is on documents that are not suitable for optical character recognition (OCR) methods but provide regular structures in the form of text-like blocks. By extracting a document immanent alphabet, preserving the graphical representations by means of vectorisation and based on these steps encoding the original document, it is possible to gather benefits of encoded text without the effort and the possible mistakes that arise from recognition methods. The use of the Extensible Markup Language (XML) for structural descriptions and Scalable Vector Graphics (SVG) for graphical represe...
Many social science researchers face the challenge of dealing with textual data that is only availab...
This paper is devoted the methods of speed-up optical character recognition which is used for ...
Includes bibliographical references (pages [75]-80)A generalized computer-based automated documentat...
Owing to a boom of information technologies optical character recognition has recently become a popu...
The paper explains how an Optical Character Recognition system (OCR) works and how this system enabl...
Digital documents are easy to handle, share and store than hard copy of documents. These made people...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Table recognition is an important tool for digitalizing documents that con- tain tabular data, which...
document image analysis system that can transform paper documents into XML format [1]. An effective ...
Optical Character Recognition (OCR) is a technology that recognizes text in documents and converts i...
The transformation of scanned paper documents to a form suitable for an Internet browser is a comple...
This paper shows an approach for converting bitmap images of text glyphs into a vector format which ...
According to Wikipedia, Optical Character Recognition (OCR) “is the mechanical or electronic transla...
International audienceThis paper presents the use of XML format for document modelling and describin...
Some thoughts of a work-in-progress digital edition project. Limits and advantages of OCR (Optical C...
Many social science researchers face the challenge of dealing with textual data that is only availab...
This paper is devoted the methods of speed-up optical character recognition which is used for ...
Includes bibliographical references (pages [75]-80)A generalized computer-based automated documentat...
Owing to a boom of information technologies optical character recognition has recently become a popu...
The paper explains how an Optical Character Recognition system (OCR) works and how this system enabl...
Digital documents are easy to handle, share and store than hard copy of documents. These made people...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Table recognition is an important tool for digitalizing documents that con- tain tabular data, which...
document image analysis system that can transform paper documents into XML format [1]. An effective ...
Optical Character Recognition (OCR) is a technology that recognizes text in documents and converts i...
The transformation of scanned paper documents to a form suitable for an Internet browser is a comple...
This paper shows an approach for converting bitmap images of text glyphs into a vector format which ...
According to Wikipedia, Optical Character Recognition (OCR) “is the mechanical or electronic transla...
International audienceThis paper presents the use of XML format for document modelling and describin...
Some thoughts of a work-in-progress digital edition project. Limits and advantages of OCR (Optical C...
Many social science researchers face the challenge of dealing with textual data that is only availab...
This paper is devoted the methods of speed-up optical character recognition which is used for ...
Includes bibliographical references (pages [75]-80)A generalized computer-based automated documentat...