A focused crawler aims at discovering as many web pages relevant to a target topic as possible, while avoiding irrelevant ones. Reinforcement Learning (RL) has been utilized to optimize focused crawling. In this paper, we propose TRES, an RL-empowered framework for focused crawling. We model the crawling environment as a Markov Decision Process, which the RL agent aims at solving by determining a good crawling strategy. Starting from a few human provided keywords and a small text corpus, that are expected to be relevant to the target topic, TRES follows a keyword set expansion procedure, which guides crawling, and trains a classifier that constitutes the reward function. To avoid a computationally infeasible brute force method for selecting...
In recent years, the World Wide Web has shown enormous growth in size. Vast repositories of informat...
International audienceA search engine uses a web crawler to crawl the pages from the world wide web ...
Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we pro...
International audienceFocused crawling aims at collecting as many Web pages relevant to a target top...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particula...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particula...
Focused crawling is the process of exploring a graph iteratively, focusing on parts of the graph rel...
Focused crawling is the process of exploring a graph iteratively, focusing on parts of the graph rel...
The Web is rapidly transforming from a pure document collection to the largest connected public data...
The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectang...
We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regar...
A baseline crawler was developed at the Bilkent University based on a focused-crawling approach. The...
The rapid growth of the World Wide Web had made the problem of useful resource discovery an importan...
AbstractGeneral crawlers use a breath first search to download as many pages as possible. Focused cr...
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose cr...
In recent years, the World Wide Web has shown enormous growth in size. Vast repositories of informat...
International audienceA search engine uses a web crawler to crawl the pages from the world wide web ...
Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we pro...
International audienceFocused crawling aims at collecting as many Web pages relevant to a target top...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particula...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particula...
Focused crawling is the process of exploring a graph iteratively, focusing on parts of the graph rel...
Focused crawling is the process of exploring a graph iteratively, focusing on parts of the graph rel...
The Web is rapidly transforming from a pure document collection to the largest connected public data...
The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectang...
We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regar...
A baseline crawler was developed at the Bilkent University based on a focused-crawling approach. The...
The rapid growth of the World Wide Web had made the problem of useful resource discovery an importan...
AbstractGeneral crawlers use a breath first search to download as many pages as possible. Focused cr...
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose cr...
In recent years, the World Wide Web has shown enormous growth in size. Vast repositories of informat...
International audienceA search engine uses a web crawler to crawl the pages from the world wide web ...
Sparse reward is one of the biggest challenges in reinforcement learning (RL). In this paper, we pro...