This paper presents the INFOCLAS system applying statistical methods of information retrieval for the classification of German business letters into corresponding message types such as order, offer, enclosure, etc. INFOCLAS is a first step towards the understanding of documents proceeding to a classification-driven extraction of information. The system is composed of two main modules: the central indexer (extraction and weighting of indexing terms) and the classifier (classification of business letters into given types). The system employs several knowledge sources including a letter database, word frequency statistics for German, lists of message type specific words, morphological knowledge as well as the underlying document structure. As ...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
Document analysis is responsible for an essential progress in office automation. This paper is part ...
Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents accord...
This paper presents the INFOCLAS system applying statistical methods of information retrieval for th...
This paper presents the INFOCLAS system applying statistical methods of information retrieval primar...
Document analysis is responsible for an essential progress in office automation. This paper is part ...
Project ALV conducted research in Document Analysis. The main goal of ALV was developing a phototypi...
The presented paper describes statistical methods (information gain, mutual X^2 statistics, and TF-I...
This paper performs a study on the pre-processing phase of the automated text classification problem...
Document analysis is responsible for an essential progress in office automation. This paper is part ...
This thesis presents the application of various classification techniques on text documents. Since t...
Analysis of large text data sets is gaining popularity providing the users some insights into their ...
This report focuses on analysis steps necessary for a paper document processing. It is divided in th...
This dissertation introduces a new theoretical model for text classification systems, including syst...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
Document analysis is responsible for an essential progress in office automation. This paper is part ...
Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents accord...
This paper presents the INFOCLAS system applying statistical methods of information retrieval for th...
This paper presents the INFOCLAS system applying statistical methods of information retrieval primar...
Document analysis is responsible for an essential progress in office automation. This paper is part ...
Project ALV conducted research in Document Analysis. The main goal of ALV was developing a phototypi...
The presented paper describes statistical methods (information gain, mutual X^2 statistics, and TF-I...
This paper performs a study on the pre-processing phase of the automated text classification problem...
Document analysis is responsible for an essential progress in office automation. This paper is part ...
This thesis presents the application of various classification techniques on text documents. Since t...
Analysis of large text data sets is gaining popularity providing the users some insights into their ...
This report focuses on analysis steps necessary for a paper document processing. It is divided in th...
This dissertation introduces a new theoretical model for text classification systems, including syst...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
(Automatic) document classification is generally defined as content-based assignment of one or more ...
Document analysis is responsible for an essential progress in office automation. This paper is part ...
Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents accord...