General Purpose Vector Representation for Swedish Documents : An application of Neural Language Models

Hedström, Simon

Publication date

January 2019

Publisher

Umeå universitet, Institutionen för fysik

Abstract

This thesis is a proof-of-concept for embedding Swedish documents using continuous vectors. These vectors can be used as input in any subsequent task and serves as an alternative to discrete bag of words vectors. The differences goes beyond fewer dimensions as the continuous vectors also hold contextual information. This means that documents with no shared vocabulary can be directly identified as contextually similar, which is impossible for the bag of words vectors. The continuous vectors are the result of neural language models and algorithms that pool the model output into document-level representations. This thesis has looked into the latest research regarding such models, starting from the Word2Vec algorithms. A wide variety of neural ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

General Purpose Vector Representation for Swedish Documents : An application of Neural Language Models

Abstract

Extracted data

General Purpose Vector Representation for Swedish Documents : An application of Neural Language Models

Abstract

Extracted data

Related items

Related items