The world wide web has a wealth of information that is related to almost any text classification task. This paper presents a method for mining the web to improve text clas-sification, by creating a background text set. Our algorithm uses the information gain criterion to create lists of important words for each class of a text categorization problem. It then searches the web on various combinations of these words to produce a set of related data. We use this set of background text with Latent Semantic Indexing classification to create an expanded term by document matrix on which singular value decomposition is done. We provide empirical results that this approach improves accuracy on unseen test examples in dif-ferent domains
International audienceThis paper presents a text-mining approach in order to extract candidate terms...
Text Classification is also called as Text Categorization (TC), is the task of classifying a set of ...
Studies on ontologies are receiving a growing attention due to their well-known nature of explicit k...
The world wide web has a wealth of information that is related to almost any text classification tas...
We illustrate that Web searches can often be utilized to gen-erate background text for use with text...
2000 Kyoto International Conference on Digital Libraries : research and practice, 11/13/2000 - 11/16...
This paper presents work that evaluates background knowl-edge for use in improving accuracy for text...
Text mining deals with retrieval of specific information provided by customer search engines. With t...
This paper has the objective of developing a methodology to store a great mass of semi-structured da...
This paper addresses the problem of categorizing terms or lexical entities into a predefined set of ...
Data acquisition is a major concern in text classification. The excessive human efforts required by ...
Web search results are far from perfect due to the polysemous and synonymous characteristics of natu...
Technology world has greatly evolved over the past decades, which led to inflated data volume. This ...
Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to t...
Abstract — Little is known about the content of the major search engines. We present an automatic le...
International audienceThis paper presents a text-mining approach in order to extract candidate terms...
Text Classification is also called as Text Categorization (TC), is the task of classifying a set of ...
Studies on ontologies are receiving a growing attention due to their well-known nature of explicit k...
The world wide web has a wealth of information that is related to almost any text classification tas...
We illustrate that Web searches can often be utilized to gen-erate background text for use with text...
2000 Kyoto International Conference on Digital Libraries : research and practice, 11/13/2000 - 11/16...
This paper presents work that evaluates background knowl-edge for use in improving accuracy for text...
Text mining deals with retrieval of specific information provided by customer search engines. With t...
This paper has the objective of developing a methodology to store a great mass of semi-structured da...
This paper addresses the problem of categorizing terms or lexical entities into a predefined set of ...
Data acquisition is a major concern in text classification. The excessive human efforts required by ...
Web search results are far from perfect due to the polysemous and synonymous characteristics of natu...
Technology world has greatly evolved over the past decades, which led to inflated data volume. This ...
Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to t...
Abstract — Little is known about the content of the major search engines. We present an automatic le...
International audienceThis paper presents a text-mining approach in order to extract candidate terms...
Text Classification is also called as Text Categorization (TC), is the task of classifying a set of ...
Studies on ontologies are receiving a growing attention due to their well-known nature of explicit k...