Loanword identification is studied in recent years to alleviate data sparseness in several natural language processing (NLP) tasks, such as machine translation, cross-lingual information retrieval, and so on. However, recent studies on this topic usually put efforts on high-resource languages (such as Chinese, English, and Russian); for low-resource languages, such as Uyghur and Mongolian, due to the limitation of resources and lack of annotated data, loanword identification on these languages tends to have lower performance. To overcome this problem, we first propose a lexical constraint-based data augmentation method to generate training data for low-resource language loanword identification; then, a loanword identification model based on...
Some natural languages belong to the same family or share similar syntactic and/or semantic regulari...
International audienceThis research extends our earlier work on using machinetranslation (MT) and wo...
Obtaining information about loan words and irregular morphological patterns can be difficult for low...
Natural Language Processing (NLP) is a field of computer science, artificial intelligence and comput...
Language identification is widely used in machine learning, text mining, information retrieval, and ...
This paper describes an accurate, extensible method for automatically classifying unknown foreign wo...
AbstractLanguage identification is widely used in machine learning, text mining, information retriev...
Automatic language identification (LID) belongs to the automatic process whereby the identity of the...
In this paper we explore the use of lexical information for language identification (LID). Our refer...
How can we effectively develop speech technology for languages where no transcribed data is availabl...
Automatic speech recognition systems with a large vocabulary and other natural language processing a...
This thesis is a quantitative study on the scale of borrowability and the scale of necessity of Engl...
International audienceThis paper reports on investigations using two techniques for language model t...
A collection of words in Kazakh and Kyrgyz that are of Mongolic origin. The data set was used in th...
Uyghur is a morphologically rich and typical agglutinating language, and morphological segmentation ...
Some natural languages belong to the same family or share similar syntactic and/or semantic regulari...
International audienceThis research extends our earlier work on using machinetranslation (MT) and wo...
Obtaining information about loan words and irregular morphological patterns can be difficult for low...
Natural Language Processing (NLP) is a field of computer science, artificial intelligence and comput...
Language identification is widely used in machine learning, text mining, information retrieval, and ...
This paper describes an accurate, extensible method for automatically classifying unknown foreign wo...
AbstractLanguage identification is widely used in machine learning, text mining, information retriev...
Automatic language identification (LID) belongs to the automatic process whereby the identity of the...
In this paper we explore the use of lexical information for language identification (LID). Our refer...
How can we effectively develop speech technology for languages where no transcribed data is availabl...
Automatic speech recognition systems with a large vocabulary and other natural language processing a...
This thesis is a quantitative study on the scale of borrowability and the scale of necessity of Engl...
International audienceThis paper reports on investigations using two techniques for language model t...
A collection of words in Kazakh and Kyrgyz that are of Mongolic origin. The data set was used in th...
Uyghur is a morphologically rich and typical agglutinating language, and morphological segmentation ...
Some natural languages belong to the same family or share similar syntactic and/or semantic regulari...
International audienceThis research extends our earlier work on using machinetranslation (MT) and wo...
Obtaining information about loan words and irregular morphological patterns can be difficult for low...