Digitization of newspapers is of interest for many reasons including preservation of history, accessibility and search ability, etc. While digitization of documents such as scientific articles and magazines is prevalent in literature, one of the main challenges for digitization of newspaper lies in its complex layout (e.g. articles spanning multiple columns, text interrupted by images) analysis, which is necessary to preserve human read-order. This work provides a major breakthrough in the digitization of newspapers on three fronts: first, releasing a dataset of 3000 fully-annotated, real-world newspaper images from 21 different U.S. states representing an extensive variety of complex layouts for document layout analysis; second, proposing ...
The massive amounts of digitized historical documents acquired over the last decades naturally lend ...
Intelligent document segmentation can bring electronic browsing within the reach of most users. The ...
In this article, we show how some concepts found in traditional and old layout practices used to lay...
Optical Character Recognition (OCR) is commonly used nowadays for printouts and documents conversion...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Information can include text, pictures and signatures that can be scanned into a document format, su...
International audienceNewspapers are documents made of news item and informative articles. They are ...
Information can include text, pictures and signatures that can be scanned into a document format, su...
One of the objectives of document image segmentation aims to decompose a digitized document image in...
With the advent of more powerful personal computers, inexpensive memory, and digital cameras, curato...
Abstract (cf. The Book of Abstracts, p. 11-13): The use of convolutional neural networks in digitiz...
We present an early version of a complete Optical Character Recognition (OCR) system for Tamil newsp...
Machine understanding of documents has become a fundamental element in applications dealing with lar...
DVD-ROM Appendix available with the print copy of this thesis.National and international initiatives...
Digital documents are easy to handle, share and store than hard copy of documents. These made people...
The massive amounts of digitized historical documents acquired over the last decades naturally lend ...
Intelligent document segmentation can bring electronic browsing within the reach of most users. The ...
In this article, we show how some concepts found in traditional and old layout practices used to lay...
Optical Character Recognition (OCR) is commonly used nowadays for printouts and documents conversion...
Abstract — Digitization of paper-bound documents is one of the foremost commercial interests worldwi...
Information can include text, pictures and signatures that can be scanned into a document format, su...
International audienceNewspapers are documents made of news item and informative articles. They are ...
Information can include text, pictures and signatures that can be scanned into a document format, su...
One of the objectives of document image segmentation aims to decompose a digitized document image in...
With the advent of more powerful personal computers, inexpensive memory, and digital cameras, curato...
Abstract (cf. The Book of Abstracts, p. 11-13): The use of convolutional neural networks in digitiz...
We present an early version of a complete Optical Character Recognition (OCR) system for Tamil newsp...
Machine understanding of documents has become a fundamental element in applications dealing with lar...
DVD-ROM Appendix available with the print copy of this thesis.National and international initiatives...
Digital documents are easy to handle, share and store than hard copy of documents. These made people...
The massive amounts of digitized historical documents acquired over the last decades naturally lend ...
Intelligent document segmentation can bring electronic browsing within the reach of most users. The ...
In this article, we show how some concepts found in traditional and old layout practices used to lay...