Interest in spoken-language corpora has increased over the past two decades leading to the development of new corpora and the discovery of new facets of spoken language. These types of corpora represent the most comprehensive data source about the language of ordinary speakers. Such corpora are based on spontaneous, unscripted speech defined by a variety of styles, registers and dialects. The aim of this paper is to present the Croatian Adult Spoken Language Corpus (HrAL), its structure and its possible applications in different linguistic subfields. HrAL was built by sampling spontaneous conversations among 617 speakers from all Croatian counties, and it comprises more than 250,000 tokens and more than 100,000 types. Data were collected d...
The Croatian language, comprising huge differences considering the number of its speakers, being ver...
In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmen...
In the last decade, corpus linguistics has finally established itself as a separate research startin...
Interest in spoken-language corpora has increased over the past two decades leading to the developme...
Corpora, as annotated archives of human communication, are objective, reliable resources for languag...
In this paper we present a corpus of audio and video recordings of spontaneous, face-to-face multi-p...
The Croatian Language Corpus was built between 2007 and 2011 at the Institute of Croatian Language a...
Abstract. In this paper a short description of activities towards building a general speech corpus o...
This paper describes the Norwegian broadcast news speech corpus RUNDKAST. The corpus contains record...
The Croatian web corpus hrWaC was built by crawling the .hr top-level domain in 2011 and again in 20...
Torlak corpus represents a spoken variety of the endangered Torlak dialect from the Timok area in So...
A comprehensive corpus of user comments on online news articles on the topic of language from major ...
The paper describes data collection and transcription to develop the Croatian discourse corpus of sp...
A comprehensive corpus of news articles on the topic of language, published in major daily newspaper...
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which cont...
The Croatian language, comprising huge differences considering the number of its speakers, being ver...
In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmen...
In the last decade, corpus linguistics has finally established itself as a separate research startin...
Interest in spoken-language corpora has increased over the past two decades leading to the developme...
Corpora, as annotated archives of human communication, are objective, reliable resources for languag...
In this paper we present a corpus of audio and video recordings of spontaneous, face-to-face multi-p...
The Croatian Language Corpus was built between 2007 and 2011 at the Institute of Croatian Language a...
Abstract. In this paper a short description of activities towards building a general speech corpus o...
This paper describes the Norwegian broadcast news speech corpus RUNDKAST. The corpus contains record...
The Croatian web corpus hrWaC was built by crawling the .hr top-level domain in 2011 and again in 20...
Torlak corpus represents a spoken variety of the endangered Torlak dialect from the Timok area in So...
A comprehensive corpus of user comments on online news articles on the topic of language from major ...
The paper describes data collection and transcription to develop the Croatian discourse corpus of sp...
A comprehensive corpus of news articles on the topic of language, published in major daily newspaper...
This article introduces a new speech corpus, the Nijmegen Corpus of Casual Czech (NCCCz), which cont...
The Croatian language, comprising huge differences considering the number of its speakers, being ver...
In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmen...
In the last decade, corpus linguistics has finally established itself as a separate research startin...