One of the primary tools used in text processing tasks such as information retrieval, text extraction, and text mining, is a corpus that is enhnaced by linguistic tags. In a corpus development effort, the role of a POS-tagger is to assign a linguistic tag to every textual token. POS annotation relies heavily on a tagset based on a linguistic theory. Text processing in Persian, too, follows this common practice. Several tagsets have been introduced, so far, to annotate Persian corpora. However, each tagset has followed a specific standard and linguistic theory. The resulting tagsets contain a limited number of tags, which renders them inadequate for a larger scope of research. This study is inspired by EAGLES, MULTEXT-East, positional...
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of...
While part-of-speech tagging is an established technology for Western European languages such as Eng...
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of...
Part-Of-Speech (POS) tagging is the proc-ess of marking-up the words in a text with their correspond...
International audienceIn (Sagot and Walther, 2010), the authors introduce an advanced tokenizer and ...
Part-Of-Speech (POS) tagging is the process of marking-up the words in a text with their correspondi...
Currently, most linguistic studies benefit from valid linguistic data available at corpora. Compilin...
This paper describes a method based on morphological analysis of words for a Persian Part-Of-Speech ...
Persian with its about 100,000,000 speakers in the world belongs to the group of languages with less...
This thesis presents open source resources in the form of annotated corpora and modules for automati...
In many applications of natural language processing (NLP) grammatically tagged corpora are needed. T...
Many NLP applications need fundamental tools to convert the input text into appropriate form or form...
This article discusses tag sets used when PoS-tagging a corpus, that is, enriching a corpus by addin...
One of the fundamental tasks in natural language processing is part of speech (POS) tagging. A POS t...
Collocation is an important level of lexis and its significance in the pedagogy of a language is an ...
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of...
While part-of-speech tagging is an established technology for Western European languages such as Eng...
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of...
Part-Of-Speech (POS) tagging is the proc-ess of marking-up the words in a text with their correspond...
International audienceIn (Sagot and Walther, 2010), the authors introduce an advanced tokenizer and ...
Part-Of-Speech (POS) tagging is the process of marking-up the words in a text with their correspondi...
Currently, most linguistic studies benefit from valid linguistic data available at corpora. Compilin...
This paper describes a method based on morphological analysis of words for a Persian Part-Of-Speech ...
Persian with its about 100,000,000 speakers in the world belongs to the group of languages with less...
This thesis presents open source resources in the form of annotated corpora and modules for automati...
In many applications of natural language processing (NLP) grammatically tagged corpora are needed. T...
Many NLP applications need fundamental tools to convert the input text into appropriate form or form...
This article discusses tag sets used when PoS-tagging a corpus, that is, enriching a corpus by addin...
One of the fundamental tasks in natural language processing is part of speech (POS) tagging. A POS t...
Collocation is an important level of lexis and its significance in the pedagogy of a language is an ...
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of...
While part-of-speech tagging is an established technology for Western European languages such as Eng...
Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of...