It is hard to collect corpora used to train good language models for many minority languages. Cantonese, one of the most popular Chinese dialects, is such a kind of language, lacking of language materials for language model training. This is a very big obstruction for the processing of Cantonese language. Unlike many other languages, there are great differences between written and colloquial Cantonese. What’s more, people in Hong Kong are using mixed Cantonese and English while they talk, which is also a special characteristic of this language. Beyond these, the materials collected from different sources have different proportion of colloquial Cantonese sentences, which means that different sources should not be equally treated. We develope...
In this paper, we will present the up-to-date status for the development of several large-scale Cant...
This study aims to investigate the usage of written Cantonese by different age groups in Hong Kong. ...
Hong Kong Cantonese is well-known for borrowing numerous English words, many of which were borrow...
This thesis presents a colloquial language modeling technique for spontaneous Cantonese speech recog...
This paper describes our recent work on developing a large vocabulary speech database for Cantonese....
Automatic recognition of Cantonese tones has long been regarded as a difficult task. Cantonese has o...
Cantonese is a major Chinese dialect with a complicated tone system. This research focuses on quanti...
We consider the possibility of Cantonese and English reciprocally influencing vowel space in Toronto...
International audienceThough Cantonese is the most influential variety of Chinese other than Mandari...
This paper introduces the development of ShefCE: a Cantonese-English bilingual speech corpus from L2...
Supplemental material, IJB808002_Supplemental_Material_REV2 for Production of Cantonese classifiers ...
This paper describes our recent work on automatic recognition of Cantonese. Cantonese is one of the ...
This paper introduces Cancorp, a one million character child language corpus constructed to study th...
This dissertation examines the development of Cantonese in young heritage speakers (age of testing 3...
Our study analyzed and compared three Cantonese language corpora (large textual databases) in order ...
In this paper, we will present the up-to-date status for the development of several large-scale Cant...
This study aims to investigate the usage of written Cantonese by different age groups in Hong Kong. ...
Hong Kong Cantonese is well-known for borrowing numerous English words, many of which were borrow...
This thesis presents a colloquial language modeling technique for spontaneous Cantonese speech recog...
This paper describes our recent work on developing a large vocabulary speech database for Cantonese....
Automatic recognition of Cantonese tones has long been regarded as a difficult task. Cantonese has o...
Cantonese is a major Chinese dialect with a complicated tone system. This research focuses on quanti...
We consider the possibility of Cantonese and English reciprocally influencing vowel space in Toronto...
International audienceThough Cantonese is the most influential variety of Chinese other than Mandari...
This paper introduces the development of ShefCE: a Cantonese-English bilingual speech corpus from L2...
Supplemental material, IJB808002_Supplemental_Material_REV2 for Production of Cantonese classifiers ...
This paper describes our recent work on automatic recognition of Cantonese. Cantonese is one of the ...
This paper introduces Cancorp, a one million character child language corpus constructed to study th...
This dissertation examines the development of Cantonese in young heritage speakers (age of testing 3...
Our study analyzed and compared three Cantonese language corpora (large textual databases) in order ...
In this paper, we will present the up-to-date status for the development of several large-scale Cant...
This study aims to investigate the usage of written Cantonese by different age groups in Hong Kong. ...
Hong Kong Cantonese is well-known for borrowing numerous English words, many of which were borrow...