In this paper we propose a new method for the automatic extraction of set phrases from corpora: the Corpus Proximity Ratio (CPR), based on the average proximity between grams within a given window. This score is non-parametric and comes close to the vectorial models used in information retrieval. Although the score still needs experimental confirmation, the preliminary results obtained (Colson 2010, Colson & Granger 2011) and the confrontation with native speaker judgment reveals a high degree of precision, while recall still needs to be explored. This paper also reports the results of an experiment carried out in collaboration with the Centre of English Corpus Linguistics at Louvain University. We will argue that translation corpora and ad...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
With the proliferation of online off-the-peg corpora over the past decade or so, the use of corpora ...
There is nowadays a broad consensus on the recourse to corpora in linguistic research. This is, howe...
Most studies devoted to the use of phraseology by non-native speakers have shown that the situation ...
The automatic extraction of all collocations / phraseologisms from corpora has a crucial role to pla...
The notion of phraseology is now used across a wide range of linguistic disciplines but it is conspi...
Phraseology has often been criticized for its lack of terminological consistency and for its very di...
In the era of globalization, one may reasonably assume that the influence of English and internation...
We refer here to phraseology as the study of set phrases in the broadest sense, including partly fix...
In spite of the success of phraseology across a range of linguistic disciplines such as corpus lingu...
Corpora are currently enjoying ever-increasing success, and are no longer solely the domain of corp...
A corpus is a collection of authentic, non-elicited texts selected and assembled to study language. ...
This paper reports the results of an experiment with the Parseme 1.1. dataset for English. While the...
How can the effects of corpora on the language learning process be effectively assessed? This is an ...
In spite of the success of phraseology across a range of linguistic disciplines such as corpus lingu...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
With the proliferation of online off-the-peg corpora over the past decade or so, the use of corpora ...
There is nowadays a broad consensus on the recourse to corpora in linguistic research. This is, howe...
Most studies devoted to the use of phraseology by non-native speakers have shown that the situation ...
The automatic extraction of all collocations / phraseologisms from corpora has a crucial role to pla...
The notion of phraseology is now used across a wide range of linguistic disciplines but it is conspi...
Phraseology has often been criticized for its lack of terminological consistency and for its very di...
In the era of globalization, one may reasonably assume that the influence of English and internation...
We refer here to phraseology as the study of set phrases in the broadest sense, including partly fix...
In spite of the success of phraseology across a range of linguistic disciplines such as corpus lingu...
Corpora are currently enjoying ever-increasing success, and are no longer solely the domain of corp...
A corpus is a collection of authentic, non-elicited texts selected and assembled to study language. ...
This paper reports the results of an experiment with the Parseme 1.1. dataset for English. While the...
How can the effects of corpora on the language learning process be effectively assessed? This is an ...
In spite of the success of phraseology across a range of linguistic disciplines such as corpus lingu...
Corpus size has traditionally been measured in number of words. Working with a single (European) lan...
With the proliferation of online off-the-peg corpora over the past decade or so, the use of corpora ...
There is nowadays a broad consensus on the recourse to corpora in linguistic research. This is, howe...