The growing amount of information published on the Web, combined with its dynamic nature, opens many challenging issues dealing with management and retrieval of the information and provisioning of the underlying infrastructures. Search engines have to meet two conflicting requirements: minimize the number of downloads and provide up-to-date information. In this paper, we present the results of an exploratory analysis aimed at investigating the novelty of the content of a news Web site. We analyzed the Web site from an horizontal perspective by focusing on the content of the individual articles and from a vertical perspective by focusing on the entire collection of articles published on the site. These two perspectives allowed us to study ho...
The aim of this work is the longitudinal study of the evolution and the state of 738 web sites in tw...
The present paper deals with a system for crawling and content extraction from news sites. The syste...
The Web is a massive and interlinked collection of documents, built using a decentralized design to ...
The growing amount of information published on the Web, combined with its dynamic nature, opens man...
The Web has become a ubiquitous tool for distributing knowledge and information and for conducting b...
The Web has become a ubiquitous tool for distributing knowledge and information and for conducting ...
News sites on the World Wide Web pose challenges for information retrieval due to their dynamic cont...
Identifying and tracking new information on the Web is im-portant in sociology, marketing, and surve...
Web sites are becoming important assets for several companies, which need to incorporate sophisticat...
Web pages are created, modified and removed at unspec-ified times by their owners. The frequency and...
Abstract. We present a detailed study of the part of the Web related to media content, i.e., the Med...
The World Wide Web is growing at an enormous speed, and has become an indispensable source for infor...
Since their emergence in the mid-90s, online media have evolved from simple digital editions that me...
Based on repetitive visits to three Scandinavian newspaper organizations, this paper presents trajec...
We present an analysis of the prevalence and nature of structural changes of websites. We study the ...
The aim of this work is the longitudinal study of the evolution and the state of 738 web sites in tw...
The present paper deals with a system for crawling and content extraction from news sites. The syste...
The Web is a massive and interlinked collection of documents, built using a decentralized design to ...
The growing amount of information published on the Web, combined with its dynamic nature, opens man...
The Web has become a ubiquitous tool for distributing knowledge and information and for conducting b...
The Web has become a ubiquitous tool for distributing knowledge and information and for conducting ...
News sites on the World Wide Web pose challenges for information retrieval due to their dynamic cont...
Identifying and tracking new information on the Web is im-portant in sociology, marketing, and surve...
Web sites are becoming important assets for several companies, which need to incorporate sophisticat...
Web pages are created, modified and removed at unspec-ified times by their owners. The frequency and...
Abstract. We present a detailed study of the part of the Web related to media content, i.e., the Med...
The World Wide Web is growing at an enormous speed, and has become an indispensable source for infor...
Since their emergence in the mid-90s, online media have evolved from simple digital editions that me...
Based on repetitive visits to three Scandinavian newspaper organizations, this paper presents trajec...
We present an analysis of the prevalence and nature of structural changes of websites. We study the ...
The aim of this work is the longitudinal study of the evolution and the state of 738 web sites in tw...
The present paper deals with a system for crawling and content extraction from news sites. The syste...
The Web is a massive and interlinked collection of documents, built using a decentralized design to ...