The dataset entail homepages for several hundred IT-blogs and websites which have been hand-picked with the intention to represent discourses dedicated to questions at the intersection of technology and society from Germany and the United States. The corresponding text collection can be reproduced with a method to duplicate the data by updating its contents and downloading it to the user’s local machine: see https://zenodo.org/record/4552529 and https://github.com/adbar/trafilatura. Online searches on the text corpus are also available: https://www.dwds.de/d/korpora/it_blogs Paper "A Reproducible IT-Blog Corpus": doi.org/10.5334/johd.3
The present paper reports the first results of the compilation and annotation of a blog corpus for G...
In recent years, linguists have become increasingly interested in the language of the Internet—both ...
To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...
The dataset comprises text and metadata extracted from several hundred IT-blogs and websites, along ...
Short paper talk at RESAW 2015 conference (Aarhus, Denmark).International audienceI would like to pr...
International audienceWe introduce two corpora gathered on the web and related to computer-mediated ...
We introduce two corpora gathered on the web and related to computer-mediated communication: blog po...
International audienceWe introduce two corpora gathered on the web and related to computer-mediated ...
The present paper reports the first results of the compilation and annotation of a blog corpus for G...
International audienceFollowing the assumption that the tech blog sphere represents an avant-garde o...
International audienceFollowing the assumption that the tech blog sphere represents an avant-garde o...
The present paper reports the first results of the compilation and annotation of a blog corpus for G...
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cl...
In recent years, linguists have become increasingly interested in the language of the Internet—both ...
Short paper talk at RESAW 2015 conference (Aarhus, Denmark).International audienceI would like to pr...
The present paper reports the first results of the compilation and annotation of a blog corpus for G...
In recent years, linguists have become increasingly interested in the language of the Internet—both ...
To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...
The dataset comprises text and metadata extracted from several hundred IT-blogs and websites, along ...
Short paper talk at RESAW 2015 conference (Aarhus, Denmark).International audienceI would like to pr...
International audienceWe introduce two corpora gathered on the web and related to computer-mediated ...
We introduce two corpora gathered on the web and related to computer-mediated communication: blog po...
International audienceWe introduce two corpora gathered on the web and related to computer-mediated ...
The present paper reports the first results of the compilation and annotation of a blog corpus for G...
International audienceFollowing the assumption that the tech blog sphere represents an avant-garde o...
International audienceFollowing the assumption that the tech blog sphere represents an avant-garde o...
The present paper reports the first results of the compilation and annotation of a blog corpus for G...
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cl...
In recent years, linguists have become increasingly interested in the language of the Internet—both ...
Short paper talk at RESAW 2015 conference (Aarhus, Denmark).International audienceI would like to pr...
The present paper reports the first results of the compilation and annotation of a blog corpus for G...
In recent years, linguists have become increasingly interested in the language of the Internet—both ...
To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...