Exploring language usage through frequency analysis in large corpora is a defining feature in most recent work in corpus and computational linguistics. From a psycholinguistic perspective, however, the corpora used in these contributions are often not representative of language usage: they are either domain-specific, limited in size, or extracted from unreliable sources. In an effort to address this limitation, we introduce SubIMDB, a corpus of everyday language spoken text we created which contains over 225 million words. The corpus was extracted from 38,102 subtitles of family, comedy and children movies and series, and is the first sizeable structured corpus of subtitles made available. Our experiments show that word frequency norms extr...
International audienceCapitalizing on the Google’s Ngram corpus, we examined the possibility to esta...
We present a new database of Dutch word frequencies based on film and television subtitles, and we v...
This study evaluates the potential for incidentally learning early reading vocabulary through the ex...
Conference paper: Collecting and Exploring Everyday Language for Predicting Psycholinguistic Propert...
We examine the use of film subtitles as an approximation of word frequencies in human interactions. ...
Accepted manuscript. Epub ahead of print, 29 Sep. 2014.We examined the potential advantage of the le...
Accepted manuscript. Epub ahead of print, 29 Sep. 2014.We examined the potential advantage of the le...
This paper introduces a novel collection of word embeddings, numerical representations of lexical se...
Previous evidence has shown that word frequencies calculated from corpora based on film and televisi...
BACKGROUND: Word frequency is the most important variable in language research. However, despite the...
We present SUBTLEX-PL, Polish word frequen- cies based on movie subtitles. In two lexical decision e...
This paper investigates online film subtitles as a separate register of communication from a quantit...
Word frequency is the most important variable in language research. However, despite the growing int...
Recent studies have shown that word frequency estimates obtained from films and television subtitles...
International audienceLinguistic research benefits from the wide range of resources and software too...
International audienceCapitalizing on the Google’s Ngram corpus, we examined the possibility to esta...
We present a new database of Dutch word frequencies based on film and television subtitles, and we v...
This study evaluates the potential for incidentally learning early reading vocabulary through the ex...
Conference paper: Collecting and Exploring Everyday Language for Predicting Psycholinguistic Propert...
We examine the use of film subtitles as an approximation of word frequencies in human interactions. ...
Accepted manuscript. Epub ahead of print, 29 Sep. 2014.We examined the potential advantage of the le...
Accepted manuscript. Epub ahead of print, 29 Sep. 2014.We examined the potential advantage of the le...
This paper introduces a novel collection of word embeddings, numerical representations of lexical se...
Previous evidence has shown that word frequencies calculated from corpora based on film and televisi...
BACKGROUND: Word frequency is the most important variable in language research. However, despite the...
We present SUBTLEX-PL, Polish word frequen- cies based on movie subtitles. In two lexical decision e...
This paper investigates online film subtitles as a separate register of communication from a quantit...
Word frequency is the most important variable in language research. However, despite the growing int...
Recent studies have shown that word frequency estimates obtained from films and television subtitles...
International audienceLinguistic research benefits from the wide range of resources and software too...
International audienceCapitalizing on the Google’s Ngram corpus, we examined the possibility to esta...
We present a new database of Dutch word frequencies based on film and television subtitles, and we v...
This study evaluates the potential for incidentally learning early reading vocabulary through the ex...