National audienceJoint building of a corpus and a classifier for language registers in French. Language registers are an observable stylistic trait of texts and speeches. However, they are still poorly studied in natural language processing. In this paper, we present a semi-supervised approach which jointly builds a corpus of texts labeled in registers and an associated classifier. This approach is based on an initial and limited set of expert data. Using an massive automatically retrieved collection of web pages, it iteratively proceeds by alternating the learning of an intermediate classifier and the annotation of new texts to augment the labeled corpus. We apply this approach to formal, neutral, and informal registers. At the end of the ...
This paper describes the ANNODIS ressource, a corpus of written French enriched with several markups...
Le traitement automatique des langues (TAL) et la linguistique de corpus sont devenus, au cours des ...
International audienceWe present in this paper a new system, MarsaTag, aiming at segmenting, tagging...
National audienceJoint building of a corpus and a classifier for language registers in French. Langu...
International audienceLanguage registers are a strongly perceptible characteristic of texts and spee...
This PhD thesis aims at automatically characterising language registers. From a linguistic point of ...
This PhD thesis aims at automatically characterising language registers. From a linguistic point of ...
International audienceThe casual, neutral, and formal language registers are highly perceptible in d...
Very few gold standard annotated corpora are currently available for French. We present an ongoing p...
National audienceThe paper presents a study of linguistic features for the characterization of a tex...
International audienceAccording to the cost of speech transcription, it is very important to pool da...
This paper presents the current status of the French treebank developed at Paris 7 (Abeille ́ et al....
National audienceWe present here the results of an experiment on part-of-speech annotation of a corp...
International audienceIn this paper, we introduce a set of resources that we have derived from the E...
In this paper, a new methodology for speech corpora definition from internet documents is described,...
This paper describes the ANNODIS ressource, a corpus of written French enriched with several markups...
Le traitement automatique des langues (TAL) et la linguistique de corpus sont devenus, au cours des ...
International audienceWe present in this paper a new system, MarsaTag, aiming at segmenting, tagging...
National audienceJoint building of a corpus and a classifier for language registers in French. Langu...
International audienceLanguage registers are a strongly perceptible characteristic of texts and spee...
This PhD thesis aims at automatically characterising language registers. From a linguistic point of ...
This PhD thesis aims at automatically characterising language registers. From a linguistic point of ...
International audienceThe casual, neutral, and formal language registers are highly perceptible in d...
Very few gold standard annotated corpora are currently available for French. We present an ongoing p...
National audienceThe paper presents a study of linguistic features for the characterization of a tex...
International audienceAccording to the cost of speech transcription, it is very important to pool da...
This paper presents the current status of the French treebank developed at Paris 7 (Abeille ́ et al....
National audienceWe present here the results of an experiment on part-of-speech annotation of a corp...
International audienceIn this paper, we introduce a set of resources that we have derived from the E...
In this paper, a new methodology for speech corpora definition from internet documents is described,...
This paper describes the ANNODIS ressource, a corpus of written French enriched with several markups...
Le traitement automatique des langues (TAL) et la linguistique de corpus sont devenus, au cours des ...
International audienceWe present in this paper a new system, MarsaTag, aiming at segmenting, tagging...