International audienceFocused crawling aims at collecting as many Web pages relevant to a target topic as possible while avoiding irrelevant pages, reflecting limited resources available to a Web crawler. We improve on the efficiency of focused crawling by proposing an approach based on reinforcement learning. Our algorithm evaluates hyperlinks most profitable to follow over the long run, and selects the most promising link based on this estimation. To properly model the crawling environment as a Markov decision process, we propose new representations of states and actions considering both content information and the link structure. The size of the state-action space is reduced by a generalization process. Based on this generalization, we u...
A focused crawler is an efficient tool used to traverse the Web to gather documents on a specific to...
The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectang...
The rapid growth of the World Wide Web had made the problem of useful resource discovery an importan...
International audienceFocused crawling aims at collecting as many Web pages relevant to a target top...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particula...
This work addresses issues related to the design and implementation of focused crawlers. Several var...
Summarization: This work addresses issues related to the design and implementation of focused crawle...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particula...
A focused crawler aims at discovering as many web pages relevant to a target topic as possible, whil...
In recent years, the World Wide Web has shown enormous growth in size. Vast repositories of informat...
Focused crawling is the process of exploring a graph iteratively, focusing on parts of the graph rel...
Focused crawlers aim to automatically discover online content resources relevant to a domain of inte...
We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regar...
Abstract. In this paper we compare our selection based learning algo-rithm with the reinforcement le...
The Web is rapidly transforming from a pure document collection to the largest connected public data...
A focused crawler is an efficient tool used to traverse the Web to gather documents on a specific to...
The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectang...
The rapid growth of the World Wide Web had made the problem of useful resource discovery an importan...
International audienceFocused crawling aims at collecting as many Web pages relevant to a target top...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particula...
This work addresses issues related to the design and implementation of focused crawlers. Several var...
Summarization: This work addresses issues related to the design and implementation of focused crawle...
Consider the task of exploring the Web in order to find pages of a particular kind or on a particula...
A focused crawler aims at discovering as many web pages relevant to a target topic as possible, whil...
In recent years, the World Wide Web has shown enormous growth in size. Vast repositories of informat...
Focused crawling is the process of exploring a graph iteratively, focusing on parts of the graph rel...
Focused crawlers aim to automatically discover online content resources relevant to a domain of inte...
We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regar...
Abstract. In this paper we compare our selection based learning algo-rithm with the reinforcement le...
The Web is rapidly transforming from a pure document collection to the largest connected public data...
A focused crawler is an efficient tool used to traverse the Web to gather documents on a specific to...
The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectang...
The rapid growth of the World Wide Web had made the problem of useful resource discovery an importan...