Tokenisers, lemmatisers and POS taggers are vital to the linguistic and digital furtherment of any language. In this paper, we present an open source toolkit for Malay incorporating a word and sentence tokeniser, a lemmatiser and a partial POS tagger, based on heavy reuse of pre-existing language resources. We outline the software architecture of each component, and present an evaluation of each over a 26K word sample of Malay text.May, 200
This paper gives a brief history of UTMK, a computer-aided translation unit, and reports on her proj...
The development of the various Malay corpora have given the opportunities to many researchers to ex...
The Malay language in Malaysia is commonly mixed with English and slang words especially when observ...
Research on Malay Part-of-Speech (POS) tagging has greatly increased over the past few years. Based ...
Part-of-Speech (POS) Tagging is one of the fundamental tasks in Natural Language Processing (NLP) i...
Abstract. This research represents the first attempt to produce a working system for the automatic p...
Due to the growth of electronic documents and the incessant increase of the power and capacity of co...
Research in Malay Part-of-Speech (POS) has increased considerably in the past few years.From the lit...
The research on part of speech (POS) tagging has been widely applied and used through a variety of a...
Generally, a corpus serves as the source of data for various types of research. As such, there are a...
The Online Malay Language Corpus-based Lexical Database for Primary Schools discussed in this paper ...
The structure of Malay presents the corpus linguist with an extremely interesting problem. At high s...
Processing the meaning of words in social media texts, such as tweets, is challenging in natural lan...
This paper describes the design and creation of a monolingual parallel corpus for the Malay language...
The research on part of speech (POS) tagging has been widely applied and used through a variety of a...
This paper gives a brief history of UTMK, a computer-aided translation unit, and reports on her proj...
The development of the various Malay corpora have given the opportunities to many researchers to ex...
The Malay language in Malaysia is commonly mixed with English and slang words especially when observ...
Research on Malay Part-of-Speech (POS) tagging has greatly increased over the past few years. Based ...
Part-of-Speech (POS) Tagging is one of the fundamental tasks in Natural Language Processing (NLP) i...
Abstract. This research represents the first attempt to produce a working system for the automatic p...
Due to the growth of electronic documents and the incessant increase of the power and capacity of co...
Research in Malay Part-of-Speech (POS) has increased considerably in the past few years.From the lit...
The research on part of speech (POS) tagging has been widely applied and used through a variety of a...
Generally, a corpus serves as the source of data for various types of research. As such, there are a...
The Online Malay Language Corpus-based Lexical Database for Primary Schools discussed in this paper ...
The structure of Malay presents the corpus linguist with an extremely interesting problem. At high s...
Processing the meaning of words in social media texts, such as tweets, is challenging in natural lan...
This paper describes the design and creation of a monolingual parallel corpus for the Malay language...
The research on part of speech (POS) tagging has been widely applied and used through a variety of a...
This paper gives a brief history of UTMK, a computer-aided translation unit, and reports on her proj...
The development of the various Malay corpora have given the opportunities to many researchers to ex...
The Malay language in Malaysia is commonly mixed with English and slang words especially when observ...