This paper presents a comparative study of different methods for the identification of multiword expressions, applied to a Brazilian Portuguese corpus. First, we selected the candidates based on the frequency of bigrams. Second, we used the linguistic information based on the grammatical classes of the words forming the bigrams, together with the frequency information in order to compare the performance of different classification algorithms. The focus of this study is related to different classification techniques such as support-vector machines (SVM), multi-layer perceptron, naïve Bayesian nets, decision trees and random forest. Third, we evaluated three different multi-layer perceptron training functions in the task of classifying differ...
We propose a framework for using multiple sources of linguistic information in the task of identifyi...
The study reports the results of the exploration of a machine-readable corpus of Brazilian Portugues...
The purpose of this paper is to present an overview of Corpus Linguistics, characterizing it as an a...
Although corpus size is a well known factor that affects the performance of many NLP tasks, for many...
Verbal multiword expressions (VMWEs) such as to make ends meet require special attention in NLP and ...
This paper presents some aspects of the first Portuguese frequency lexicon extracted from a corpus o...
This presentation focuses on an ongoing project which aims at the creation of a large lexical databa...
This work presents a comparative study between two different approaches to build an automatic classi...
Automatic Language Identification of written texts is a well-established area of research in Computa...
International audienceThis paper describes a process for bootstrapping a computational lexicon of mu...
Expressões multipalavra (EMPs) são um dos obstáculos para a obtenção de sistemas de PLN mais preciso...
This work presents a comparative study between two different approaches to build an aut...
Abstract. This paper proposes and evaluates the use of linguistic in-formation in the pre-processing...
International audienceIn this article, we present the Brazilian Portuguese Lexicon, a new word-based...
Background: Term extraction is highly relevant as it is the basis for several tasks, such as the bui...
We propose a framework for using multiple sources of linguistic information in the task of identifyi...
The study reports the results of the exploration of a machine-readable corpus of Brazilian Portugues...
The purpose of this paper is to present an overview of Corpus Linguistics, characterizing it as an a...
Although corpus size is a well known factor that affects the performance of many NLP tasks, for many...
Verbal multiword expressions (VMWEs) such as to make ends meet require special attention in NLP and ...
This paper presents some aspects of the first Portuguese frequency lexicon extracted from a corpus o...
This presentation focuses on an ongoing project which aims at the creation of a large lexical databa...
This work presents a comparative study between two different approaches to build an automatic classi...
Automatic Language Identification of written texts is a well-established area of research in Computa...
International audienceThis paper describes a process for bootstrapping a computational lexicon of mu...
Expressões multipalavra (EMPs) são um dos obstáculos para a obtenção de sistemas de PLN mais preciso...
This work presents a comparative study between two different approaches to build an aut...
Abstract. This paper proposes and evaluates the use of linguistic in-formation in the pre-processing...
International audienceIn this article, we present the Brazilian Portuguese Lexicon, a new word-based...
Background: Term extraction is highly relevant as it is the basis for several tasks, such as the bui...
We propose a framework for using multiple sources of linguistic information in the task of identifyi...
The study reports the results of the exploration of a machine-readable corpus of Brazilian Portugues...
The purpose of this paper is to present an overview of Corpus Linguistics, characterizing it as an a...