To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webpages) classified under each one of the following categories: Health, Art, Politics, Sports, Science, Technology, Economy, and Business. Next, we extracting from each article the title, the body and the category to which it belongs to. Finally, we stored in our database the title, body and category for each article that was downloaded. As a results—after removing duplicates—we obtained a corpus that comprises 23,863 documents that we randomly split in a training sequence that comprises 14,356 documents and a test sequence composed of 9,507 documents
This corpus contains 150,466 news articles, which is derived from several freely accessible Indonesi...
A corpus of 471,085,690 English sentences extracted from the ClueWeb12 Web Crawl. The sentences were...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...
AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole c...
AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole c...
AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole c...
AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole c...
A Unix-based system is presented which automatic collects newspaper articles from the web, converts ...
htmlabstractIn recent years, several datasets have been released that include images and text, givin...
<a>The </a><a>20 newsgroups</a> corpus is a widely used <a>corpus</a> belonging to 20 related catego...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
In NLP Centre, dividing text into sentences is currently done with a tool which uses rule-based sy...
The culture of online-news consumption continues to take shape and is gaining popularity, increasing...
A corpus of 471,085,690 English sentences extracted from the ClueWeb12 Web Crawl. The sentences were...
This corpus contains 150,466 news articles, which is derived from several freely accessible Indonesi...
A corpus of 471,085,690 English sentences extracted from the ClueWeb12 Web Crawl. The sentences were...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...
To create the corpus, first we download from Reuters website 27,000 random news articles (HTML webp...
AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole c...
AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole c...
AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole c...
AG’s news corpus (AGNEWS): This AG’s corpus of news articles was collected from the web. The whole c...
A Unix-based system is presented which automatic collects newspaper articles from the web, converts ...
htmlabstractIn recent years, several datasets have been released that include images and text, givin...
<a>The </a><a>20 newsgroups</a> corpus is a widely used <a>corpus</a> belonging to 20 related catego...
Efforts to use web data as corpora seek to provide solutions to problems traditional corpora suffer ...
In NLP Centre, dividing text into sentences is currently done with a tool which uses rule-based sy...
The culture of online-news consumption continues to take shape and is gaining popularity, increasing...
A corpus of 471,085,690 English sentences extracted from the ClueWeb12 Web Crawl. The sentences were...
This corpus contains 150,466 news articles, which is derived from several freely accessible Indonesi...
A corpus of 471,085,690 English sentences extracted from the ClueWeb12 Web Crawl. The sentences were...
We have built a corpus containing texts in 106 languages from texts available on the Internet and on...