Automatic layout analysis of historical documents has to cope with a large number of different scripts, writing supports, and digitalization qualities. Under these conditions, the design of robust features for machine learning is a highly challenging task. We use convolutional autoencoders to learn features from the images. In order to increase the classification accuracy and to reduce the feature dimension, in this paper we propose a novel feature selection method. The method cascades adapted versions of two conventional methods. Compared to three conventional methods and our previous work, the proposed method achieves a higher classification accuracy in most cases, while maintaining low feature dimension. In addition, we find that a signi...
Feature selection methods are often applied in the context of document classification. They are part...
In this paper, we propose a new dataset and a ground-truthing methodology for layout analysis of his...
Mass digitization of historical documents is a challenging problem for optical character recognition...
Automatic layout analysis of historical documents has to cope with a large number of different scrip...
Layout analysis of historical handwritten documents is a key pre-processing step in document image a...
In this paper, we present an unsupervised feature learning method for page segmentation of historica...
The term "historical documents" encompasses an enormous variety of document types considering differ...
The solution for a feature selection problem is presented in the field of document image processing....
Document layout segmentation and recognition is an important task in the creation of digitized docum...
Recently, texture features have been widely used for historical document image analysis. However, fe...
In this project, a state-of-the-art CV model called Mask Region Based Convolutional Neural Networks ...
One of the objectives of document image segmentation aims to decompose a digitized document image in...
International audienceRecent progress in the digitization of heterogeneous collections of ancient do...
Document classification has been involved in a variety of applications, such as phishing and fraud d...
International audienceBackground. In recent years, libraries and archives led important digitisation...
Feature selection methods are often applied in the context of document classification. They are part...
In this paper, we propose a new dataset and a ground-truthing methodology for layout analysis of his...
Mass digitization of historical documents is a challenging problem for optical character recognition...
Automatic layout analysis of historical documents has to cope with a large number of different scrip...
Layout analysis of historical handwritten documents is a key pre-processing step in document image a...
In this paper, we present an unsupervised feature learning method for page segmentation of historica...
The term "historical documents" encompasses an enormous variety of document types considering differ...
The solution for a feature selection problem is presented in the field of document image processing....
Document layout segmentation and recognition is an important task in the creation of digitized docum...
Recently, texture features have been widely used for historical document image analysis. However, fe...
In this project, a state-of-the-art CV model called Mask Region Based Convolutional Neural Networks ...
One of the objectives of document image segmentation aims to decompose a digitized document image in...
International audienceRecent progress in the digitization of heterogeneous collections of ancient do...
Document classification has been involved in a variety of applications, such as phishing and fraud d...
International audienceBackground. In recent years, libraries and archives led important digitisation...
Feature selection methods are often applied in the context of document classification. They are part...
In this paper, we propose a new dataset and a ground-truthing methodology for layout analysis of his...
Mass digitization of historical documents is a challenging problem for optical character recognition...