Given only the URL of a Web page, can we identify its topic? We study this problem in detail by exploring a large number of different feature sets and algorithms on several datasets. We also show that the inherent overlap between topics and the sparsity of the information in URLs makes this a very challenging problem. Web page classification without a page's content is desirable when the content is not available at all, when a classification is needed before obtaining the content, or when classification speed is of utmost importance. For our experiments we used five different corpora comprising a total of about 3 million (URL, classification) pairs. We evaluated several techniques for feature generation and classification algorithms. The in...
International audienceThe immense number of documents published on the web requires the utilization ...
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to ...
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to ...
The World Wide Web is one of the most widely used information resources. Understanding the web bette...
Abstract: he World Wide Web has enormously increased day by day. Hence it is necessary for classifyi...
There are some situations these days in which it is important to have an efficient and reliable clas...
With the exponential increase in a number of web pages daily, it makes it very difficult for a searc...
he World Wide Web has enormously increased day by day. Hence it is necessary for classifying the w...
muenchen.de When automatically extracting information from the world wide web, most established meth...
The Internet contains a vast amount of data that is growing exponentially. To exploit this data, a W...
The task of topical classification of Web queries is to classify Web queries into a set of target ca...
Many term weighting methods are suggested in the literature for Information Retrieval and Text Categ...
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are of...
Nowadays, when a keyword is provided, a search engine can return a large number of web pages, which ...
The task of topical classification of Web queries is to classify Web queries into a set of target ca...
International audienceThe immense number of documents published on the web requires the utilization ...
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to ...
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to ...
The World Wide Web is one of the most widely used information resources. Understanding the web bette...
Abstract: he World Wide Web has enormously increased day by day. Hence it is necessary for classifyi...
There are some situations these days in which it is important to have an efficient and reliable clas...
With the exponential increase in a number of web pages daily, it makes it very difficult for a searc...
he World Wide Web has enormously increased day by day. Hence it is necessary for classifying the w...
muenchen.de When automatically extracting information from the world wide web, most established meth...
The Internet contains a vast amount of data that is growing exponentially. To exploit this data, a W...
The task of topical classification of Web queries is to classify Web queries into a set of target ca...
Many term weighting methods are suggested in the literature for Information Retrieval and Text Categ...
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are of...
Nowadays, when a keyword is provided, a search engine can return a large number of web pages, which ...
The task of topical classification of Web queries is to classify Web queries into a set of target ca...
International audienceThe immense number of documents published on the web requires the utilization ...
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to ...
The internet is frequently surfed by people by using smartphones, laptops, or computers in order to ...