Unknown words, or out of vocabulary words (OOV), cause a significant problem to morphological analysers, syntactic parses, MT systems and other NLP applications. Unknown words make up 29 % of the word types in in a large Arabic corpus used in this study. With today's corpus sizes exceeding 10 9 words, it becomes impossible to manually check corpora for new words to be included in a lexicon. We develop a finite-state morphological guesser and integrate it with a machine-learning-based pre-annotation tool in a pipeline architecture for extracting unknown words, lemmatizing them, and giving them a priority weight for inclusion in a lexical database. The processing is performed on a corpus of contemporary Arabic of 1,089,111,204 words. Our...
We develop an open-source large-scale finite-state morphological processing toolkit (Ara-ComLex) for...
We explore the application of memorybased learning to morphological analysis and part-of-speech ta...
Modelling the mental lexicon focuses on processing and storage dynamics, since lexical organisation ...
We develop an open-source large-scale finite-state morphological processing toolkit (AraComLex) for M...
We develop an open-source large-scale finite-state morphological processing toolkit (AraComLex) for M...
Abstract. Current Arabic lexicons, whether computational or otherwise, make no distinction between e...
We provide lexical profiling for Arabic by covering two important linguistic aspects of Arabic lexic...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
This paper presents a study of the impact of using simple and complex morphological clues to improve...
Applications of statistical Arabic NLP in general, and text mining in specific, along with the tools...
We describe a model for the lexical analy-sis of Arabic text, using the lists of alterna-tives suppl...
We describe the generation of an Arabic full-form lexicon and its conversion into a two-level Finite...
This article describes the construction of a lexicon and a morphological description for standard Ar...
Corpus-based methods have been widely used to tackle NLP tasks after the advent of annotated corpora...
We develop an open-source large-scale finite-state morphological processing toolkit (Ara-ComLex) for...
We explore the application of memorybased learning to morphological analysis and part-of-speech ta...
Modelling the mental lexicon focuses on processing and storage dynamics, since lexical organisation ...
We develop an open-source large-scale finite-state morphological processing toolkit (AraComLex) for M...
We develop an open-source large-scale finite-state morphological processing toolkit (AraComLex) for M...
Abstract. Current Arabic lexicons, whether computational or otherwise, make no distinction between e...
We provide lexical profiling for Arabic by covering two important linguistic aspects of Arabic lexic...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
This paper presents a study of the impact of using simple and complex morphological clues to improve...
Applications of statistical Arabic NLP in general, and text mining in specific, along with the tools...
We describe a model for the lexical analy-sis of Arabic text, using the lists of alterna-tives suppl...
We describe the generation of an Arabic full-form lexicon and its conversion into a two-level Finite...
This article describes the construction of a lexicon and a morphological description for standard Ar...
Corpus-based methods have been widely used to tackle NLP tasks after the advent of annotated corpora...
We develop an open-source large-scale finite-state morphological processing toolkit (Ara-ComLex) for...
We explore the application of memorybased learning to morphological analysis and part-of-speech ta...
Modelling the mental lexicon focuses on processing and storage dynamics, since lexical organisation ...