In this paper we propose a novel word representation for Chinese based on a state-of-the-art word embedding approach. Our main contribution is to integrate distributional representations of Chinese characters into the word embedding. Recent related work on European languages has demonstrated that information from inflectional morphology can reduce the problem of sparse data and improve word representations. Chinese has very little inflectional morphology, but there is potential for incorporating character-level information. Chinese characters are drawn from a fixed set – with just under four thousand in common usage – but a major problem with using characters is their ambiguity. In order to address this problem, we disambiguate the characte...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
Text representation can map text into a vector space for subsequent use in numerical calculations an...
Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings indepen...
Distributed word representations are very useful for capturing semantic information and have been su...
Most word embedding methods take a word as a ba-sic unit and learn embeddings according to words’ ex...
In the Chinese language, words consist of characters each of which is composed of one or more compon...
We propose cw2vec, a novel method for learning Chinese word embeddings. It is based on our observati...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
This thesis deals with Chinese characters (Hanzi): their key characteristics and how they could be u...
Characters play an important role in the Chinese language, yet computational pro-cessing of Chinese ...
The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanes...
One of the central concerns in theories of reading skills is the role of phonology in access to word...
Designations have been used very inconsistently in deciphering the nature of the Chinese writing sys...
This thesis looks into the problem of learning Chinese characters for foreign language learners and ...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
Text representation can map text into a vector space for subsequent use in numerical calculations an...
Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings indepen...
Distributed word representations are very useful for capturing semantic information and have been su...
Most word embedding methods take a word as a ba-sic unit and learn embeddings according to words’ ex...
In the Chinese language, words consist of characters each of which is composed of one or more compon...
We propose cw2vec, a novel method for learning Chinese word embeddings. It is based on our observati...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
This thesis deals with Chinese characters (Hanzi): their key characteristics and how they could be u...
Characters play an important role in the Chinese language, yet computational pro-cessing of Chinese ...
The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanes...
One of the central concerns in theories of reading skills is the role of phonology in access to word...
Designations have been used very inconsistently in deciphering the nature of the Chinese writing sys...
This thesis looks into the problem of learning Chinese characters for foreign language learners and ...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
Text representation can map text into a vector space for subsequent use in numerical calculations an...
Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings indepen...