We investigate the performance of text-based language identification systems on the 11 offi-cial languages of South Africa, when n-gram statistics are used as features for classification. In particular, we compare support vector ma-chines (SVMs) and likelihood-based classifiers on different amounts of input text, both from a closed domain and an open domain. With as few as 15 words of input text, reliable language identification is possible. Although the SVM is generally more accurate a classifier, the addi-tional computational complexity of training this classifier may not be justified in light of the im-portance of using a large value for n. 1
International audienceThe present contribution revolves around efficient approaches to language clas...
Language models are the foundation of current neural network-based models for natural language under...
In this paper we describe the language identification system we developed for the Discriminating Sim...
The classification accuracy of text-based language identification depends on several factors, includ...
We present a statistical approach to text-based automatic language identification that focuses on di...
Language identification is an important pre-process in many data management and information retrieva...
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these...
In this paper, we explore the use of the Support Vector Machines (SVMs) to learn a discriminatively ...
Language identification is a text classification task for identifying the language of a given text. ...
In this paper we present two experiments conducted for comparison of different language identificati...
This work was supported by the Department of Arts and Culture.The NCHLT speech corpus contains wide-...
Abstract—: Text based language identification is the task of automatically recognizing a language fr...
Language identification of written text has been studied for several decades. Despite this fact, mos...
Abstract—Language Identification is the process of determining in which natural language the content...
Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campu...
International audienceThe present contribution revolves around efficient approaches to language clas...
Language models are the foundation of current neural network-based models for natural language under...
In this paper we describe the language identification system we developed for the Discriminating Sim...
The classification accuracy of text-based language identification depends on several factors, includ...
We present a statistical approach to text-based automatic language identification that focuses on di...
Language identification is an important pre-process in many data management and information retrieva...
South Africa has eleven official languages, ten of which are considered “resource-scarce”. For these...
In this paper, we explore the use of the Support Vector Machines (SVMs) to learn a discriminatively ...
Language identification is a text classification task for identifying the language of a given text. ...
In this paper we present two experiments conducted for comparison of different language identificati...
This work was supported by the Department of Arts and Culture.The NCHLT speech corpus contains wide-...
Abstract—: Text based language identification is the task of automatically recognizing a language fr...
Language identification of written text has been studied for several decades. Despite this fact, mos...
Abstract—Language Identification is the process of determining in which natural language the content...
Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campu...
International audienceThe present contribution revolves around efficient approaches to language clas...
Language models are the foundation of current neural network-based models for natural language under...
In this paper we describe the language identification system we developed for the Discriminating Sim...