When building large-scale machine learning (ML) programs, such as big topic models or deep neural nets, one usually assumes such tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners or academic researchers. We consider this challenge in the context of topic modeling on web-scale corpora, and show that with a modest cluster of as few as 8 machines, we can train a topic model with 1 million topics and a 1-million-word vocabulary (for a total of 1 trillion parameters), on a document collection with 200 billion tokens -- a scale not yet reported even with thousands of machines. Our major contributions include: 1) a new, highly efficient O(1) Metropolis-Hastings samplin...
The sizes of modern digital libraries have grown beyond our capacity to comprehend manually. Thus we...
After a decade of accelerated progress in the different areas of machine learning (ML), it has becom...
The machine learning (ML) industry has taken great strides forward and is today facing new challenge...
<p>When building large-scale machine learning (ML) programs, such as big topic models or deep neural...
When building large-scale machine learning (ML) programs, such as massive topic models or deep neura...
<p>In real world industrial applications of topic modeling, the ability to capture gigantic conceptu...
In real world industrial applications of topic modeling, the ability to capture gigantic conceptual ...
How can one build a distributed framework that allows ef-ficient deployment of a wide spectrum of mo...
Learning meaningful topic models with massive document collections which contain millions of documen...
We present LDA*, a system that has been deployed in one of the largest Internet companies to fulfil ...
The rise of big data has led to new demands for machine learning (ML) systems to learn complex model...
ABSTRACTThe rise of big data has led to new demands for machine learning (ML) systems to learn compl...
This paper describes a high performance sampling archi-tecture for inference of latent topic models ...
Machine learning (ML) is a cornerstone of the new data revolution. Most attempts to scale machine le...
We present the design and implementation of GLDA, a library that utilizes the GPU (Graphics Processi...
The sizes of modern digital libraries have grown beyond our capacity to comprehend manually. Thus we...
After a decade of accelerated progress in the different areas of machine learning (ML), it has becom...
The machine learning (ML) industry has taken great strides forward and is today facing new challenge...
<p>When building large-scale machine learning (ML) programs, such as big topic models or deep neural...
When building large-scale machine learning (ML) programs, such as massive topic models or deep neura...
<p>In real world industrial applications of topic modeling, the ability to capture gigantic conceptu...
In real world industrial applications of topic modeling, the ability to capture gigantic conceptual ...
How can one build a distributed framework that allows ef-ficient deployment of a wide spectrum of mo...
Learning meaningful topic models with massive document collections which contain millions of documen...
We present LDA*, a system that has been deployed in one of the largest Internet companies to fulfil ...
The rise of big data has led to new demands for machine learning (ML) systems to learn complex model...
ABSTRACTThe rise of big data has led to new demands for machine learning (ML) systems to learn compl...
This paper describes a high performance sampling archi-tecture for inference of latent topic models ...
Machine learning (ML) is a cornerstone of the new data revolution. Most attempts to scale machine le...
We present the design and implementation of GLDA, a library that utilizes the GPU (Graphics Processi...
The sizes of modern digital libraries have grown beyond our capacity to comprehend manually. Thus we...
After a decade of accelerated progress in the different areas of machine learning (ML), it has becom...
The machine learning (ML) industry has taken great strides forward and is today facing new challenge...