This paper describes a new method for the classification of a HTML document into a hierarchy of categories. The hierarchy of categories is involved in all phases of automated document classification, namely feature extraction, learning, and classification of a new document. The innovative aspects of this work are the feature selection process, the automated threshold determination for classification scores, and an experimental study on real-word Web documents that can be associated to any node in the hierarchy. Moreover, a new measure for the evaluation of system performances has been introduced in order to compare three different techniques (flat, hierarchical with proper training sets, hierarchical with hierarchical training sets). The me...
In the last decade the interest in the hierarchical organization of documents is increased. New chal...
Abstract. In a text categorization task, classification on some hierar-chy of classes shows better r...
In this work we implement and evaluate a methodology to classify multi-labeled web documents into la...
In this paper, the problem of classifying a HTML documents into a hierarchy of categories is invest...
Abstract. In this paper, the problem of classifying a HTML documents into a hierarchy of categories ...
Abstract. This paper describes a method for the automatic classification of a HTML document into a h...
Most of the research on text categorization has focused on classifying text documents into a set of ...
Most of works on text categorization have focused on classifying documents into a set of categories ...
This paper describes automatic document categorization based on large text hierarchy. We handle the...
In this paper, the problem of classifying HTML documents is investigated in the context of a client-...
Searching for Web sites is one of the most common tasks performed on the Web. Web page classificatio...
In this paper, we present a new technique which is the Admixture MCRDR-FCA (AMF) algorithm for Web d...
Traditional machine learning classifications of HTML documents fo-cus on features drawn from term in...
Automatic classification of web pages is an effective way to deal with the difficulty of retrieving ...
In a text categorization task, classification on some hierarchy of classes shows better results than...
In the last decade the interest in the hierarchical organization of documents is increased. New chal...
Abstract. In a text categorization task, classification on some hierar-chy of classes shows better r...
In this work we implement and evaluate a methodology to classify multi-labeled web documents into la...
In this paper, the problem of classifying a HTML documents into a hierarchy of categories is invest...
Abstract. In this paper, the problem of classifying a HTML documents into a hierarchy of categories ...
Abstract. This paper describes a method for the automatic classification of a HTML document into a h...
Most of the research on text categorization has focused on classifying text documents into a set of ...
Most of works on text categorization have focused on classifying documents into a set of categories ...
This paper describes automatic document categorization based on large text hierarchy. We handle the...
In this paper, the problem of classifying HTML documents is investigated in the context of a client-...
Searching for Web sites is one of the most common tasks performed on the Web. Web page classificatio...
In this paper, we present a new technique which is the Admixture MCRDR-FCA (AMF) algorithm for Web d...
Traditional machine learning classifications of HTML documents fo-cus on features drawn from term in...
Automatic classification of web pages is an effective way to deal with the difficulty of retrieving ...
In a text categorization task, classification on some hierarchy of classes shows better results than...
In the last decade the interest in the hierarchical organization of documents is increased. New chal...
Abstract. In a text categorization task, classification on some hierar-chy of classes shows better r...
In this work we implement and evaluate a methodology to classify multi-labeled web documents into la...