In literature, many feature types and learning algorithms are proposed for document classification. However, an extensive and systematic evaluation of the various approaches has not been done yet. In order to investigate different text representations for document classification, we have developed a tool which transforms documents into feature-value representations suitable for standard learning algorithms. In this paper we investigate seven document representations for German texts based on n-grams and single words. We compare their effectiveness in classifying OCR texts and the corresponding correct ASCII texts in two domains: business letters and abstracts of technical reports. Our results indicate that the use of n- grams is an attracti...
Text classification is used to classify the document of similar types . Text classification can be a...
Thesis (M.S.)--University of Hawaii at Manoa, 2008.Includes bibliographical references (leaves 56-58...
We conduct an assessment of the impact of OCR quality in collections in Dutch, considering two tasks...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
International audienceThe classification of digital documents is a complex task in a document analys...
In this paper we perform a comparative analysis of three models for a feature representation of text...
Conventionally, document classification researches focus on improving the learning capabilities of c...
Abstract — With the increasing availability of electronic documents and the rapid growth of the Worl...
This paper investigates the problem of text classification. The task of text classification is to as...
Current general digitization approach of paper media is converting them into the digital images by a...
This thesis presents the application of various classification techniques on text documents. Since t...
Automatic text classification is the process of automatically classifying text documents into pre-de...
Abstract. Current general digitization approach of paper media is converting them into the digital i...
Abstract: This paper provides an analysis of multi-class e-mail categorization per-formance. In orde...
Text classification is used to classify the document of similar types . Text classification can be a...
Thesis (M.S.)--University of Hawaii at Manoa, 2008.Includes bibliographical references (leaves 56-58...
We conduct an assessment of the impact of OCR quality in collections in Dutch, considering two tasks...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
International audienceThe classification of digital documents is a complex task in a document analys...
In this paper we perform a comparative analysis of three models for a feature representation of text...
Conventionally, document classification researches focus on improving the learning capabilities of c...
Abstract — With the increasing availability of electronic documents and the rapid growth of the Worl...
This paper investigates the problem of text classification. The task of text classification is to as...
Current general digitization approach of paper media is converting them into the digital images by a...
This thesis presents the application of various classification techniques on text documents. Since t...
Automatic text classification is the process of automatically classifying text documents into pre-de...
Abstract. Current general digitization approach of paper media is converting them into the digital i...
Abstract: This paper provides an analysis of multi-class e-mail categorization per-formance. In orde...
Text classification is used to classify the document of similar types . Text classification can be a...
Thesis (M.S.)--University of Hawaii at Manoa, 2008.Includes bibliographical references (leaves 56-58...
We conduct an assessment of the impact of OCR quality in collections in Dutch, considering two tasks...