The majority of machine learning research has been focused on building models and inference techniques with sound mathematical properties and cutting edge performance. Little attention has been devoted to the development of data representation that can be used to improve a user's ability to interpret the data and machine learning models to solve real-world problems. In this paper, we quantitatively and qualitatively evaluate an efficient, accurate and scalable feature-compression method using latent Dirichlet allocation for discrete data. This representation can effectively communicate the characteristics of high-dimensional, complex data points. We show that the improvement of a user's interpretability through the use of a topic modeling-b...
International audienceThis paper investigates, from information theoretic principles, a learning pro...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...
The majority of machine learning research has been fo-cused on building models and inference techniq...
Automatic text categorization is one of the key techniques in information retrieval and the data min...
This paper focuses on large–scale unsupervised feature selection from text. We expand upon the recen...
Digital media tend to combine text and images to express richer information, especially on image hos...
Abstract This paper presents novel datasets providing numerical representations of ICD-10-CM codes b...
In this paper, I apply latent dirichlet allocation(LDA) to cluster 100,000 health related articles u...
Over the past couple decades, we have witnessed a huge explosion in data generation from almost ever...
Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of par...
13 pages, 7 figures. Submitted for publicationThis paper investigates, from information theoretic gr...
The automatic discovery of a significant low-dimensional feature representation from a given data se...
For a language model (LM) to faithfully model human language, it must compress vast, potentially in...
Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections o...
International audienceThis paper investigates, from information theoretic principles, a learning pro...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...
The majority of machine learning research has been fo-cused on building models and inference techniq...
Automatic text categorization is one of the key techniques in information retrieval and the data min...
This paper focuses on large–scale unsupervised feature selection from text. We expand upon the recen...
Digital media tend to combine text and images to express richer information, especially on image hos...
Abstract This paper presents novel datasets providing numerical representations of ICD-10-CM codes b...
In this paper, I apply latent dirichlet allocation(LDA) to cluster 100,000 health related articles u...
Over the past couple decades, we have witnessed a huge explosion in data generation from almost ever...
Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of par...
13 pages, 7 figures. Submitted for publicationThis paper investigates, from information theoretic gr...
The automatic discovery of a significant low-dimensional feature representation from a given data se...
For a language model (LM) to faithfully model human language, it must compress vast, potentially in...
Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections o...
International audienceThis paper investigates, from information theoretic principles, a learning pro...
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requ...
This paper presents an efficient framework for error-bounded compression of high-dimensional discret...