An Entropy Estimate of Written Language and Twitter Language : A Comparison between English and Swedish

Juhlin, Sanna

Publication date

January 2017

Publisher

Linnéuniversitetet, Institutionen för matematik (MA)

Abstract

The purpose of this study is to estimate and compare the entropy and redundancy of written English and Swedish. We also investigate and compare the entropy and redundancy of Twitter language. This is done by extracting n consecutive characters called n-grams and calculating their frequencies. No precise values are obtained, due to the amount of text being finite, while the entropy is estimated for text length tending towards infinity. However we do obtain results for n = 1,...,6 and the results show that written Swedish has higher entropy than written English and that the redundancy is lower for Swedish language. When comparing Twitter with the standard languages we find that for Twitter, the entropy is higher and the redundancy is lower

Extracted data

We use cookies to provide a better user experience.

Data Protection

An Entropy Estimate of Written Language and Twitter Language : A Comparison between English and Swedish

Abstract

Extracted data

An Entropy Estimate of Written Language and Twitter Language : A Comparison between English and Swedish

Abstract

Extracted data

Related items

Related items