We built a system for the automatic creation of a text-based topic hierarchy, meant to be used in a geographically defined community. This poses two main problems. First, the ap-pearance of both standard language and a community-related dialect, demanding that dialect words should be as much as possible corrected to standard words, and second, the auto-matic hierarchic clustering of texts by their topic. The problem of correcting dialect words is dealt with by performing a nearest neighbor search over a dynamic set of known words, using a set of transition rules from dialect to standard words, which are learned from a pair-wise lexicon. We tackle the clustering problem by implementing a hierar-chical co-clustering algorithm that automatical...
While automated methods for information organization have been around for several decades now, expon...
Topic hierarchies can help researchers to develop a quick and concise understanding of the main them...
Hierarchies have long been used for organization, summarization, and access to information. In this ...
It is crucial in many information systems to organize short text segments, such as keywords in docum...
This thesis proposes a novel model for automatically generate topic map for a document corpus with n...
This paper proposes a framework to automatically construct taxonomies from a corpus of text document...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
Automatically identifying related special-ist terms is a difficult and important task required to un...
In this paper we apply various clustering algorithms to the dialect pronuncia-tion data. At the same...
This chapter describes a novel multistage method for linguistic clustering of large collections of t...
Building taxonomies for Web content manually is costly and time-consuming. An alternative is to allo...
In this paper, we introduce a new clustering algorithm for discovering and describing the topics com...
International audienceThe most popular topic modelling algorithm, Latent Dirichlet Allocation, produ...
In this paper, we present a novel method for automatically building hierarchical topic structures of...
This paper presents a technique to automatically derive ontologies which is based on hierarchical cl...
While automated methods for information organization have been around for several decades now, expon...
Topic hierarchies can help researchers to develop a quick and concise understanding of the main them...
Hierarchies have long been used for organization, summarization, and access to information. In this ...
It is crucial in many information systems to organize short text segments, such as keywords in docum...
This thesis proposes a novel model for automatically generate topic map for a document corpus with n...
This paper proposes a framework to automatically construct taxonomies from a corpus of text document...
One of the major challenges of mining topics from a large corpus is the quality of the constructed t...
Automatically identifying related special-ist terms is a difficult and important task required to un...
In this paper we apply various clustering algorithms to the dialect pronuncia-tion data. At the same...
This chapter describes a novel multistage method for linguistic clustering of large collections of t...
Building taxonomies for Web content manually is costly and time-consuming. An alternative is to allo...
In this paper, we introduce a new clustering algorithm for discovering and describing the topics com...
International audienceThe most popular topic modelling algorithm, Latent Dirichlet Allocation, produ...
In this paper, we present a novel method for automatically building hierarchical topic structures of...
This paper presents a technique to automatically derive ontologies which is based on hierarchical cl...
While automated methods for information organization have been around for several decades now, expon...
Topic hierarchies can help researchers to develop a quick and concise understanding of the main them...
Hierarchies have long been used for organization, summarization, and access to information. In this ...