Research in machine translation and corpus annotation has greatly benefited from the increasing availability of word-aligned parallel corpora. This paper presents ongoing research on the development and application of the sawa corpus, a two-million-word parallel corpus English-Swahili. We describe the data collection phase and zero in on the difficulties of finding appropriate and easily accessible data for this language pair. In the data annotation phase, the corpus was semi-automatically sentence and word-aligned and morphosyntactic information was added to both the English and Swahili portion of the corpus. The annotated parallel corpus allows us to investigate two possible uses. We describe experiments with the projection of part-of-spe...
Abstract: This article discusses some problems encountered in the processing of the Shona corpus. Mo...
Abstract: In this paper the writer examines problems the African Languages Lexical (ALLEX) Project (...
Speech corpus being the basic requirement for the development of Automatic speech recognition (ASR) ...
Research in machine translation and corpus annotation has greatly benefited from the increasing avai...
Abstract: In this article we survey four different electronic bilingual dictionaries for the lan-gu...
In this article we survey four different electronic bilingual dictionaries for the language pair Swa...
A Project Report Submitted to the School of Science and Technology in Partial Fulfillment of the Req...
This paper describes our approach to create a neural machine translation system to translate between...
This paper explores the review of Swahili text and speech databases/corpus in different dimensions i...
Finding large amounts of text data for use in natural language technology is difficult for under-res...
Dione CMB, Kuhn J, Zarrieß S. Design and Development of Part-of-Speech-Tagging Resources for Wolof (...
We’ve developed an open-source, high quality isiZulu parallel corpus that comes from a mixture of do...
This paper deals with translation of English documents to Oromo using statistical methods. Whereas E...
This dataset contains 100,000 Kiswahili sentences. For more information on how the dataset was creat...
Computational morphological analysis is an important first step in the automatic treatment of natura...
Abstract: This article discusses some problems encountered in the processing of the Shona corpus. Mo...
Abstract: In this paper the writer examines problems the African Languages Lexical (ALLEX) Project (...
Speech corpus being the basic requirement for the development of Automatic speech recognition (ASR) ...
Research in machine translation and corpus annotation has greatly benefited from the increasing avai...
Abstract: In this article we survey four different electronic bilingual dictionaries for the lan-gu...
In this article we survey four different electronic bilingual dictionaries for the language pair Swa...
A Project Report Submitted to the School of Science and Technology in Partial Fulfillment of the Req...
This paper describes our approach to create a neural machine translation system to translate between...
This paper explores the review of Swahili text and speech databases/corpus in different dimensions i...
Finding large amounts of text data for use in natural language technology is difficult for under-res...
Dione CMB, Kuhn J, Zarrieß S. Design and Development of Part-of-Speech-Tagging Resources for Wolof (...
We’ve developed an open-source, high quality isiZulu parallel corpus that comes from a mixture of do...
This paper deals with translation of English documents to Oromo using statistical methods. Whereas E...
This dataset contains 100,000 Kiswahili sentences. For more information on how the dataset was creat...
Computational morphological analysis is an important first step in the automatic treatment of natura...
Abstract: This article discusses some problems encountered in the processing of the Shona corpus. Mo...
Abstract: In this paper the writer examines problems the African Languages Lexical (ALLEX) Project (...
Speech corpus being the basic requirement for the development of Automatic speech recognition (ASR) ...