In this paper, we describe a release of a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the tagged corpus. Additionally, we use this data to train a single standalone tagger which will hopefully significantly simplify Urdu processing. The standalone tagger obtains the accuracy of 88.74% on test data
Urdu is the national language of Pakistan, also the most widely spoken and understandable language o...
This study describes a Natural Language Processing (NLP) toolkit, as the first contribution of a lar...
The rise of social networking sites and blogs has simulated a bull market in personal opinion; consu...
In this paper, we describe a release of a sizeable monolingual Urdu corpus automatically tagged with...
We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We exte...
In this paper, we focus on improving part-of-speech (POS) tagging for Urdu by using exist-ing tools ...
Urdu is a language of the Indo-Aryan family, widely spoken in India and Pakistan, and an important m...
Urdu is a language of the Indo-Aryan family, widely spoken in India and Pakistan, and an important m...
While part-of-speech tagging is an established technology for Western European languages such as Eng...
We address the problem of Part-of-Speech (POS) tagging of Urdu. POS tagging is the process of assign...
A variety of verb phrases exist in Urdu includ-ing simple verb phrases, conjunct verb phrases and co...
This work presents the development of the URDU.KON-TB treebank, its annotation evaluation & guidelin...
This work presents the development of the URDU.KON-TB treebank, its annotation evaluation & guidelin...
A variety of verb phrases exist in Urdu includ-ing simple verb phrases, conjunct verb phrases and co...
Text tokenization is a fundamental pre-processing step for almost all the information processing app...
Urdu is the national language of Pakistan, also the most widely spoken and understandable language o...
This study describes a Natural Language Processing (NLP) toolkit, as the first contribution of a lar...
The rise of social networking sites and blogs has simulated a bull market in personal opinion; consu...
In this paper, we describe a release of a sizeable monolingual Urdu corpus automatically tagged with...
We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We exte...
In this paper, we focus on improving part-of-speech (POS) tagging for Urdu by using exist-ing tools ...
Urdu is a language of the Indo-Aryan family, widely spoken in India and Pakistan, and an important m...
Urdu is a language of the Indo-Aryan family, widely spoken in India and Pakistan, and an important m...
While part-of-speech tagging is an established technology for Western European languages such as Eng...
We address the problem of Part-of-Speech (POS) tagging of Urdu. POS tagging is the process of assign...
A variety of verb phrases exist in Urdu includ-ing simple verb phrases, conjunct verb phrases and co...
This work presents the development of the URDU.KON-TB treebank, its annotation evaluation & guidelin...
This work presents the development of the URDU.KON-TB treebank, its annotation evaluation & guidelin...
A variety of verb phrases exist in Urdu includ-ing simple verb phrases, conjunct verb phrases and co...
Text tokenization is a fundamental pre-processing step for almost all the information processing app...
Urdu is the national language of Pakistan, also the most widely spoken and understandable language o...
This study describes a Natural Language Processing (NLP) toolkit, as the first contribution of a lar...
The rise of social networking sites and blogs has simulated a bull market in personal opinion; consu...