This paper reports the preliminary results of an experiment carried out on a large scale for the extraction of PUs (phraseological units, also called idioms) from large web corpora in four languages (English, Spanish, French, Chinese). The use of a new algorithm based on metric clustering techniques, of optimized database storage and of interaction with users and researchers by means of a web application, made it possible to reach high precision scores for most common PUs in the four languages, while further experimentation is still necessary for establishing recall levels with long n-grams. In the meantime, the freely accessible web application makes it possible to visualize the high proportion of phraseology in the broad sense (or of form...
Given the limited size of existing idiom corpora, we aim to enable progress in automatic idiom proce...
one of the units that make up the dictionary composition of the language is phraseologisms. Phraseol...
We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinatio...
This paper reports the preliminary results of an experiment carried out on a large scale for the ext...
The use of the World Wide Web for linguistic purposes is a fairly recent development. In their every...
The automatic extraction of all collocations / phraseologisms from corpora has a crucial role to pla...
This paper reports the results of an experiment with the Parseme 1.1. dataset for English. While the...
The paper describes an online German-Russian database for phraseological constructions (PhC), or syn...
The notion of phraseology is now used across a wide range of linguistic disciplines but it is conspi...
In the era of globalization, one may reasonably assume that the influence of English and internation...
Corpora are currently enjoying ever-increasing success, and are no longer solely the domain of corp...
This paper presents a possible architecture for a multilingual database of idioms. We discuss the ch...
As a fascinating and colorful part of English language, idioms highly affect fluency, but they are q...
We refer here to phraseology as the study of set phrases in the broadest sense, including partly fix...
This paper presents a possible architecture for a multilingual database of idioms. We discuss the ch...
Given the limited size of existing idiom corpora, we aim to enable progress in automatic idiom proce...
one of the units that make up the dictionary composition of the language is phraseologisms. Phraseol...
We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinatio...
This paper reports the preliminary results of an experiment carried out on a large scale for the ext...
The use of the World Wide Web for linguistic purposes is a fairly recent development. In their every...
The automatic extraction of all collocations / phraseologisms from corpora has a crucial role to pla...
This paper reports the results of an experiment with the Parseme 1.1. dataset for English. While the...
The paper describes an online German-Russian database for phraseological constructions (PhC), or syn...
The notion of phraseology is now used across a wide range of linguistic disciplines but it is conspi...
In the era of globalization, one may reasonably assume that the influence of English and internation...
Corpora are currently enjoying ever-increasing success, and are no longer solely the domain of corp...
This paper presents a possible architecture for a multilingual database of idioms. We discuss the ch...
As a fascinating and colorful part of English language, idioms highly affect fluency, but they are q...
We refer here to phraseology as the study of set phrases in the broadest sense, including partly fix...
This paper presents a possible architecture for a multilingual database of idioms. We discuss the ch...
Given the limited size of existing idiom corpora, we aim to enable progress in automatic idiom proce...
one of the units that make up the dictionary composition of the language is phraseologisms. Phraseol...
We evaluate the possibility to learn, in an unsupervised manner, a list of idiomatic word combinatio...