In this paper, we propose a Web page archiving system that combines state-of-the-art comparison methods based on the source codes of Web pages, with computer vision techniques. To detect whether successive versions of a Web page are similar or not, our system is based on: (1) a combination of structural and visual comparison methods embedded in a statistical discriminative model, (2) a visual similarity mea-sure designed for Web pages that improves change detection, (3) a supervised feature selection method adapted to Web archiving. We train a Support Vector Machine model with vectors of similarity scores between successive versions of pages. The trained model then determines whether two ver-sions, defined by their vector of similarity scor...
In this paper, a novel approach is introduced to compare web sites by analysing their web page conte...
International audienceDue to the growing importance of the Web, several archiving institutes (nation...
As the number of web pages increases, search for useful information by users on web sites will becom...
We present in this paper a Web page archiving approach combining image and structural techniques. Ou...
Abstract. When we describe a Web page informally, we often use phrases like \it looks like a newspap...
International audienceNowadays, many applications are interested in detecting and discovering change...
AbstractDespite the exponential WWW growth and the success of the Semantic Web, there is limited sup...
International audienceDue to the growing importance of the World Wide Web, archiving it has become c...
The work deals with the design of a system foron-line analysis of web page similarity. The system co...
Though there are millions of websites on the internet, half of the ones we come across do not provid...
A relevant consequence of the expansion of the web and e-commerce is the growth of the demand of new...
Περιέχει το πλήρες κείμενοDue to the growing importance of the World Wide Web, archiving the web has...
Web archives offer a rich and plentiful source of information to researchers, analysts, and legal ex...
In this paper we investigate the effect of using clustering algorithms in the reverse engineering fi...
We propose an approach to automatically detect duplicated pages in dynamic Web sites. Our approach a...
In this paper, a novel approach is introduced to compare web sites by analysing their web page conte...
International audienceDue to the growing importance of the Web, several archiving institutes (nation...
As the number of web pages increases, search for useful information by users on web sites will becom...
We present in this paper a Web page archiving approach combining image and structural techniques. Ou...
Abstract. When we describe a Web page informally, we often use phrases like \it looks like a newspap...
International audienceNowadays, many applications are interested in detecting and discovering change...
AbstractDespite the exponential WWW growth and the success of the Semantic Web, there is limited sup...
International audienceDue to the growing importance of the World Wide Web, archiving it has become c...
The work deals with the design of a system foron-line analysis of web page similarity. The system co...
Though there are millions of websites on the internet, half of the ones we come across do not provid...
A relevant consequence of the expansion of the web and e-commerce is the growth of the demand of new...
Περιέχει το πλήρες κείμενοDue to the growing importance of the World Wide Web, archiving the web has...
Web archives offer a rich and plentiful source of information to researchers, analysts, and legal ex...
In this paper we investigate the effect of using clustering algorithms in the reverse engineering fi...
We propose an approach to automatically detect duplicated pages in dynamic Web sites. Our approach a...
In this paper, a novel approach is introduced to compare web sites by analysing their web page conte...
International audienceDue to the growing importance of the Web, several archiving institutes (nation...
As the number of web pages increases, search for useful information by users on web sites will becom...