Automatic Language Identification of written texts is a well-established area of research in Computational Linguistics. State-of-the-art algorithms often rely on n-gram character models to identify the correct language of texts, with good results seen for European languages. In this paper we propose the use of a character n-gram model and a word n-gram language model for the automatic classification of two written varieties of Portuguese: European and Brazilian. Results reached 0.998 for accuracy using character 4-grams
This paper describes two automatic systems: a linguistic features extractor and a text readability c...
This paper presents a comparative study of different methods for the identification of multiword exp...
Brazilian Portuguese, the national language,spoken and used in Brazil, has its socio-historical orig...
Automatic Language Identification of written texts is a well-established area of research in Computa...
Automatic Language Identification of written texts is a well-established area of research in Computa...
Automatic Language Identification of written texts is a well-established area of research in Computa...
This study presents a new language identification model for pluricentric languages that uses n-gram ...
Language identification is an important first step in many IR and NLP applications. Most publicly av...
We present a statistical approach to text-based automatic language identification that focuses on di...
This paper describes an accent identification system for Portuguese, that explores different type of...
Abstract—Language Identification is the process of determining in which natural language the content...
This study presents a new language identification model for pluricentric languages that uses n-gram ...
This paper describes three approaches to the task of automatically identifying the language a text i...
Statistical n-gram language modeling is used in many domains like speech recognition, language ident...
This paper describes two automatic systems: a linguistic features extractor and a text readability c...
This paper describes two automatic systems: a linguistic features extractor and a text readability c...
This paper presents a comparative study of different methods for the identification of multiword exp...
Brazilian Portuguese, the national language,spoken and used in Brazil, has its socio-historical orig...
Automatic Language Identification of written texts is a well-established area of research in Computa...
Automatic Language Identification of written texts is a well-established area of research in Computa...
Automatic Language Identification of written texts is a well-established area of research in Computa...
This study presents a new language identification model for pluricentric languages that uses n-gram ...
Language identification is an important first step in many IR and NLP applications. Most publicly av...
We present a statistical approach to text-based automatic language identification that focuses on di...
This paper describes an accent identification system for Portuguese, that explores different type of...
Abstract—Language Identification is the process of determining in which natural language the content...
This study presents a new language identification model for pluricentric languages that uses n-gram ...
This paper describes three approaches to the task of automatically identifying the language a text i...
Statistical n-gram language modeling is used in many domains like speech recognition, language ident...
This paper describes two automatic systems: a linguistic features extractor and a text readability c...
This paper describes two automatic systems: a linguistic features extractor and a text readability c...
This paper presents a comparative study of different methods for the identification of multiword exp...
Brazilian Portuguese, the national language,spoken and used in Brazil, has its socio-historical orig...