In order to perform analysis over weblogs, we must first iden-tify the appropriate unit of a weblog that corresponds to a document. We argue in the paper that, for weblogs, the cor-rect unit is the weblog post. A weblog post is a structured document with the following fields: date, timestamp, title, content, permalink and author. We present our approach for segmenting weblogs into posts, which breaks down into sev-eral steps: (1) automatic feed discovery; (2) feed-guided seg-mentation, using the weblog feed and HTML; and (3) model-based weblog segementation
In this paper, we propose the architecture for a weblog data mining system. Our objective is to allo...
Blogs are one of the most prominent means of communication on the web. Their content, interconnectio...
This paper proposes a fully automated information extraction methodology for weblogs. The methodolog...
User generated content forms an important domain for mining knowledge. In this paper, we address the...
Abstract — The analysis of weblogs has become a popular area of natural language processing. Due to ...
Many new electronic publication systems have arisen in the past few years, one of them is the ...
Blogs are a dynamic communication medium which has been widely established on the web. The BlogForev...
Many new electronic publication systems have arisen in the past few years, one of them is the ...
[EN] In the last 10 years, the information generated on weblog sites has increased exponentially, re...
[EN] In the last 10 years, the information generated on weblog sites has increased exponentially, re...
In recent years we have seen a vast increase in the volume of information published on weblog sites ...
Abstract: I propose the concept of a latent weblog community (LBC), as a means to promote the autono...
In recent years we have seen a vast increase in the volume of information published on weblog sites ...
Weblogs, or blogs, are becoming more and more interesting for a wide audience. Millions of personal,...
Abstract—Blogs, news portal and discussion forums are of high interest for today’s social interactio...
In this paper, we propose the architecture for a weblog data mining system. Our objective is to allo...
Blogs are one of the most prominent means of communication on the web. Their content, interconnectio...
This paper proposes a fully automated information extraction methodology for weblogs. The methodolog...
User generated content forms an important domain for mining knowledge. In this paper, we address the...
Abstract — The analysis of weblogs has become a popular area of natural language processing. Due to ...
Many new electronic publication systems have arisen in the past few years, one of them is the ...
Blogs are a dynamic communication medium which has been widely established on the web. The BlogForev...
Many new electronic publication systems have arisen in the past few years, one of them is the ...
[EN] In the last 10 years, the information generated on weblog sites has increased exponentially, re...
[EN] In the last 10 years, the information generated on weblog sites has increased exponentially, re...
In recent years we have seen a vast increase in the volume of information published on weblog sites ...
Abstract: I propose the concept of a latent weblog community (LBC), as a means to promote the autono...
In recent years we have seen a vast increase in the volume of information published on weblog sites ...
Weblogs, or blogs, are becoming more and more interesting for a wide audience. Millions of personal,...
Abstract—Blogs, news portal and discussion forums are of high interest for today’s social interactio...
In this paper, we propose the architecture for a weblog data mining system. Our objective is to allo...
Blogs are one of the most prominent means of communication on the web. Their content, interconnectio...
This paper proposes a fully automated information extraction methodology for weblogs. The methodolog...