This paper presents a new approach and a software for collecting specialized corpora on the Web. This approach takes advantage of a very popular XML-based norm used on the Web for sharing content among websites: RSS (Really Simple Syndication). After a brief introduction to RSS, we explain the interest of this type of data sources in the framework of corpus development. Finally, we present Corporator, an Open Source software which was designed for collecting corpus from RSS feed
A corpus is a collection of texts in electronic form that are the object of literary or linguistic s...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
In this paper we present the Corpógrafo, an integrated web-based environment for corpus linguistics ...
This article describes a software application that downloads given RSS feeds and compiles them into ...
This paper presents GlossaNet 2, a free online concordance service that enables users to search into...
The RSS Feed Analysis Application and Corpus Builder is a software application that downloads given ...
International audienceThe conventional tools of the "web as corpus" framework rely heavily on URLs o...
We describe our use of RSS news feeds to quickly assemble a parallel English-Japanese corpus. Our me...
As corpus building is an activity that takes times and costs money, readers may wish to use ready-ma...
Over the last decade, methods of web corpus construction and the evaluation of web corpora have been...
This work is a presentation of tagging and formatting of text-data corpus. It creates a layer above ...
The role of the Web for text corpus construction is becoming increasingly significant. However, the ...
We present a web service for quickly producing corpora for specialist areas, in any of a range of la...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
For typical Web developers, it is complicated to integrate content from the Semantic Web to an exist...
A corpus is a collection of texts in electronic form that are the object of literary or linguistic s...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
In this paper we present the Corpógrafo, an integrated web-based environment for corpus linguistics ...
This article describes a software application that downloads given RSS feeds and compiles them into ...
This paper presents GlossaNet 2, a free online concordance service that enables users to search into...
The RSS Feed Analysis Application and Corpus Builder is a software application that downloads given ...
International audienceThe conventional tools of the "web as corpus" framework rely heavily on URLs o...
We describe our use of RSS news feeds to quickly assemble a parallel English-Japanese corpus. Our me...
As corpus building is an activity that takes times and costs money, readers may wish to use ready-ma...
Over the last decade, methods of web corpus construction and the evaluation of web corpora have been...
This work is a presentation of tagging and formatting of text-data corpus. It creates a layer above ...
The role of the Web for text corpus construction is becoming increasingly significant. However, the ...
We present a web service for quickly producing corpora for specialist areas, in any of a range of la...
We investigate the potential of using the web as a huge corpus for language studies. We test the hyp...
For typical Web developers, it is complicated to integrate content from the Semantic Web to an exist...
A corpus is a collection of texts in electronic form that are the object of literary or linguistic s...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
In this paper we present the Corpógrafo, an integrated web-based environment for corpus linguistics ...