This paper presents a machine-learning approach for ranking web documents according to the proportion of procedural text they contain. By 'pro-cedural text' we refer to ordered lists of steps, which are very common in some instructional genres such as online manuals. Our initial training corpus is built up by applying some simple heuristics to select documents from a large collection and contains only a few documents with a large proportion of procedural texts. We adapt the Naive Bayes classifier to better fit this less than ideal training corpus. This adapted model is compared with several other classifiers in ranking procedural texts using different sets of features and is shown to perform well when only highly distinctive features are us...
Thesis (M.S.)--University of Hawaii at Manoa, 2008.Includes bibliographical references (leaves 56-58...
The work presents the field of document classification. It describes existing techniques with emphas...
In order to gain information from huge amount of text more efficiently and accurately, readers may u...
This paper describes automatic document categorization based on large text hierarchy. We handle the...
Ioan Pop To perform the ranking document or the Web Mining tasks we have considered an approach base...
This paper proposes an efficient algorithm for the generation of new features that enrich the known ...
We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for...
This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text c...
This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text c...
The automated classification of texts into predefined categories has witnessed a booming interest, d...
Learning to Rank (LtR) is an effective machine learning methodology for inducing high-quality docume...
We present an approach to text categorization using machine learning techniques. The approach is dev...
This paper describes the usage of machine learning techniques to assign keywords to documents. The l...
Abstract- This paper describes automatic document categorization based on large text hierarchy. We h...
There are numerous text documents available in electronic form. More and more are becoming available...
Thesis (M.S.)--University of Hawaii at Manoa, 2008.Includes bibliographical references (leaves 56-58...
The work presents the field of document classification. It describes existing techniques with emphas...
In order to gain information from huge amount of text more efficiently and accurately, readers may u...
This paper describes automatic document categorization based on large text hierarchy. We handle the...
Ioan Pop To perform the ranking document or the Web Mining tasks we have considered an approach base...
This paper proposes an efficient algorithm for the generation of new features that enrich the known ...
We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for...
This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text c...
This paper presents an extension of prior work by Michael D. Lee on psychologically plausible text c...
The automated classification of texts into predefined categories has witnessed a booming interest, d...
Learning to Rank (LtR) is an effective machine learning methodology for inducing high-quality docume...
We present an approach to text categorization using machine learning techniques. The approach is dev...
This paper describes the usage of machine learning techniques to assign keywords to documents. The l...
Abstract- This paper describes automatic document categorization based on large text hierarchy. We h...
There are numerous text documents available in electronic form. More and more are becoming available...
Thesis (M.S.)--University of Hawaii at Manoa, 2008.Includes bibliographical references (leaves 56-58...
The work presents the field of document classification. It describes existing techniques with emphas...
In order to gain information from huge amount of text more efficiently and accurately, readers may u...