In this paper we bring to light a novel intersection between corpus linguistics and behavioral data that can be employed as an evaluation metric for resources for low-density languages, drawing on well-established psycholinguistic factors. Using the low-density language Maltese as a test case, we highlight the challenges that face researchers developing resources for languages with sparsely available data and identify a key empirical link between corpus and psycholinguistic research as a tool to evaluate corpus resources. Specifically, we compare two robust variables identified in the psycholinguistic literature: word frequency (as measured in a corpus) and word familiarity (as measured in a rating task). We then use three statistical metho...
Corpus-based research is not in opposition to other methodologies of language investigation, such as...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
Exploring language usage through frequency analysis in large corpora is a defining feature in most r...
A corpus is a collection of authentic, non-elicited texts selected and assembled to study language. ...
This paper presents efforts to evaluate the representativeness of the Setswana corpus data with meas...
Linguistics has drawn on the large quantities of authentic data contained in language corpora for se...
The quality of statistical measurements on corpora is strongly related to a strict definition of the...
Un article portant sur les corpus d'apprenants vient de sortir dans la revue Language Learning. Actu...
Conference paper: Collecting and Exploring Everyday Language for Predicting Psycholinguistic Propert...
International audienceThis volume brings together a selection of papers originally presented at the ...
Reading corpora are text collections that are enriched with processing data. From a corpus linguist’...
International audienceThis volume brings together a selection of papers originally presented at the ...
When analysing corpora with automatic and statistical means, one should remember that the raw materi...
Large data resources play an increasingly important role in both linguistics and psycholinguistics. ...
With the advent of corpus linguistics, the use of corpora has become central in linguistics. One und...
Corpus-based research is not in opposition to other methodologies of language investigation, such as...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
Exploring language usage through frequency analysis in large corpora is a defining feature in most r...
A corpus is a collection of authentic, non-elicited texts selected and assembled to study language. ...
This paper presents efforts to evaluate the representativeness of the Setswana corpus data with meas...
Linguistics has drawn on the large quantities of authentic data contained in language corpora for se...
The quality of statistical measurements on corpora is strongly related to a strict definition of the...
Un article portant sur les corpus d'apprenants vient de sortir dans la revue Language Learning. Actu...
Conference paper: Collecting and Exploring Everyday Language for Predicting Psycholinguistic Propert...
International audienceThis volume brings together a selection of papers originally presented at the ...
Reading corpora are text collections that are enriched with processing data. From a corpus linguist’...
International audienceThis volume brings together a selection of papers originally presented at the ...
When analysing corpora with automatic and statistical means, one should remember that the raw materi...
Large data resources play an increasingly important role in both linguistics and psycholinguistics. ...
With the advent of corpus linguistics, the use of corpora has become central in linguistics. One und...
Corpus-based research is not in opposition to other methodologies of language investigation, such as...
Statistical language modelling may not only be used to uncover the patterns which underlie the compo...
Exploring language usage through frequency analysis in large corpora is a defining feature in most r...