Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages. We present Code-Switched LDA (csLDA), which in-fers language specific topic distributions based on code-switched documents to fa-cilitate multi-lingual corpus analysis. We experiment on two code-switching cor-pora (English-Spanish Twitter data and English-Chinese Weibo data) and show that csLDA improves perplexity over LDA, and learns semantically coherent aligned topics as judged by human anno-tators.
Many people are multilingual and they may draw from multiple language varieties when writing their m...
We study the problem of linking information between different idiomatic usages of the same language,...
We present a new corpus of Twitter data annotated for codeswitching and borrowing between Spanish an...
Code-switched documents are common in social media, providing evidence for polylingual topic models ...
Abstract. In this paper, we present the Polylingual Labeled Topic Model, a model which combines the ...
This paper explores bridging the content of two different languages via latent topics. Specifically,...
Abstract. This paper explores bridging the content of two different languages via latent topics. Spe...
We study the problem of extracting cross-lingual topics from non-parallel multilingual text datasets...
Topic models are a useful tool for analyzing large text collections, but have previously been applie...
Multilingual users of social media sometimes use multiple languages during conversation. Mixing mult...
Probabilistic topic models are unsupervised generative models which model document content as a two-...
Code-mixing or language-mixing is a linguistic phenomenon where multiple language mix together durin...
Topic modeling is a widely used approach to analyzing large text collections. A small number of mult...
Abstract Topic modeling is a widely used approach to analyzing large text collections. A small numbe...
Many people are multilingual and they may draw from multiple language varieties when writing their m...
Many people are multilingual and they may draw from multiple language varieties when writing their m...
We study the problem of linking information between different idiomatic usages of the same language,...
We present a new corpus of Twitter data annotated for codeswitching and borrowing between Spanish an...
Code-switched documents are common in social media, providing evidence for polylingual topic models ...
Abstract. In this paper, we present the Polylingual Labeled Topic Model, a model which combines the ...
This paper explores bridging the content of two different languages via latent topics. Specifically,...
Abstract. This paper explores bridging the content of two different languages via latent topics. Spe...
We study the problem of extracting cross-lingual topics from non-parallel multilingual text datasets...
Topic models are a useful tool for analyzing large text collections, but have previously been applie...
Multilingual users of social media sometimes use multiple languages during conversation. Mixing mult...
Probabilistic topic models are unsupervised generative models which model document content as a two-...
Code-mixing or language-mixing is a linguistic phenomenon where multiple language mix together durin...
Topic modeling is a widely used approach to analyzing large text collections. A small number of mult...
Abstract Topic modeling is a widely used approach to analyzing large text collections. A small numbe...
Many people are multilingual and they may draw from multiple language varieties when writing their m...
Many people are multilingual and they may draw from multiple language varieties when writing their m...
We study the problem of linking information between different idiomatic usages of the same language,...
We present a new corpus of Twitter data annotated for codeswitching and borrowing between Spanish an...