We propose cw2vec, a novel method for learning Chinese word embeddings. It is based on our observation that exploiting stroke-level information is crucial for improving the learning of Chinese word embeddings. Specifically, we design a minimalist approach to exploit such features, by using stroke n-grams, which capture semantic and morphological level information of Chinese words. Through qualitative analysis, we demonstrate that our model is able to extract semantic information that cannot be captured by existing methods. Empirical results on the word similarity, word analogy, text classification and named entity recognition tasks show that the proposed approach consistently outperforms state-of-the-art approaches such as word-based word2...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
This study explores the feasibility of perform-ing Chinese word segmentation (CWS) and POS tagging b...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
In this paper we propose a novel word representation for Chinese based on a state-of-the-art word em...
Most word embedding methods take a word as a ba-sic unit and learn embeddings according to words’ ex...
In the Chinese language, words consist of characters each of which is composed of one or more compon...
Chinese event extraction uses word embedding to capture similarity, but suffers when handling previo...
Chinese characters carry a wealth of morphological and semantic information; therefore, zero-shot Ch...
Distributed word representations are very useful for capturing semantic information and have been su...
Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings indepen...
Chinese characters have semantic-rich compositional information in radical form. While almost all pr...
Web provides a large-scale corpus for researchers to study the language usages in real world. Develo...
Recent work has shown success in learning word embeddings with neural network language models (NNLM)...
The learning of Chinese has become very important and popular all over the world. For most Chinese o...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
This study explores the feasibility of perform-ing Chinese word segmentation (CWS) and POS tagging b...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
In this paper we propose a novel word representation for Chinese based on a state-of-the-art word em...
Most word embedding methods take a word as a ba-sic unit and learn embeddings according to words’ ex...
In the Chinese language, words consist of characters each of which is composed of one or more compon...
Chinese event extraction uses word embedding to capture similarity, but suffers when handling previo...
Chinese characters carry a wealth of morphological and semantic information; therefore, zero-shot Ch...
Distributed word representations are very useful for capturing semantic information and have been su...
Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings indepen...
Chinese characters have semantic-rich compositional information in radical form. While almost all pr...
Web provides a large-scale corpus for researchers to study the language usages in real world. Develo...
Recent work has shown success in learning word embeddings with neural network language models (NNLM)...
The learning of Chinese has become very important and popular all over the world. For most Chinese o...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
Distributional Similarity has attracted considerable attention in the field of natural language proc...
This study explores the feasibility of perform-ing Chinese word segmentation (CWS) and POS tagging b...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...