I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii This work aims at studying the effect of word position in text on understanding and tracking the content of written text. In this thesis we present two uses of word position in text: topic word selectors and topic flow signals. The topic word selectors identify important words, called topic words, by their spread through a text. The underlying assumption here is that words that repeat across the text are likely to be more relevant to the main topic of the text than ones that are concentrated in smal...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
We present an unsupervised method for the generation from a textual corpus of sets of keywords, that...
Dividing documents into topically-coherent units and discovering their topic might have many uses. W...
Most documents are about more than one subject, but the majority of natural language processing algo...
To understand text, we must relate it with specified situations. This paper, on the basis of such an...
Automatic identification of influential segments from a large amount of data is an important part of...
Abstract — Text mining is a field that automatically extracts previously unknown and useful informa...
User generated content in the form of customer reviews, blogs or tweets is an emerging and rich sour...
Term-based approaches can extract many features in text documents, but most include noise. Many popu...
. We investigate the problem of text segmentation by topic. Applications for this task include topic...
A massive amount of online information is natural language text: newspapers, blog articles, forum po...
Most documents are aboutmore than one subject, but the majority of natural language processing algor...
Topic indexing is the task of identifying the main topics covered by a document. These are useful fo...
We provide a brief, non-technical introduction to the text mining methodology known as topic modelin...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
We present an unsupervised method for the generation from a textual corpus of sets of keywords, that...
Dividing documents into topically-coherent units and discovering their topic might have many uses. W...
Most documents are about more than one subject, but the majority of natural language processing algo...
To understand text, we must relate it with specified situations. This paper, on the basis of such an...
Automatic identification of influential segments from a large amount of data is an important part of...
Abstract — Text mining is a field that automatically extracts previously unknown and useful informa...
User generated content in the form of customer reviews, blogs or tweets is an emerging and rich sour...
Term-based approaches can extract many features in text documents, but most include noise. Many popu...
. We investigate the problem of text segmentation by topic. Applications for this task include topic...
A massive amount of online information is natural language text: newspapers, blog articles, forum po...
Most documents are aboutmore than one subject, but the majority of natural language processing algor...
Topic indexing is the task of identifying the main topics covered by a document. These are useful fo...
We provide a brief, non-technical introduction to the text mining methodology known as topic modelin...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
Most topic models, such as latent Dirichlet allocation, rely on the bag of words assumption. However...
We present an unsupervised method for the generation from a textual corpus of sets of keywords, that...