Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data intensive, and the ease of deployment of algorithms is an important factor in developing advanced applications, we introduce a flexible, distributed, MapReduce-based text mining workflow that performs I/O-bound operations on CPUs with industry-standard tools and then runs compute-bound operations on GPUs which are optimized to ensure coalesced memory access and effective use of shared memory. We have performed extensive tests of our algorithms on a clust...
Through the algorthmic design patterns of data parallelism and task parallelism, the graphics proces...
Distributed computing technologies allow a wide variety of tasks that use large amounts of data to b...
In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogene...
Scientific computations have been using GPU-enabled computers success-fully, often relying on distri...
Probabilistic Latent Semantic Analysis (PLSA) has been successfully applied to many text mining task...
In this age, a huge amount of data is generated every day by human interactions with services. Disco...
In this paper we introduce a MapReduce-based implementation of self-organizing maps that performs co...
Graph Pattern Mining (GPM) extracts higher-order information in a large graph by searching for small...
Many real-world applications are capable of producing continuous, infinite streams of data. During t...
We present two efficient Apriori implementations of Frequent Itemset Mining (FIM) that utilize new-g...
As Machine Learning (ML) applications are becoming ever more pervasive, fully-trained systems are ma...
As ML applications are becoming ever more pervasive, fully-trained systems are made increasingly ava...
We present GPMR, our MapReduce library that leverages the power of GPU clusters for large-scale comp...
AbstractData mining tools may be computationally demanding, so there is an increasing interest on pa...
Abstract—The exponential increase in the generation and collection of data has led us in a new era o...
Through the algorthmic design patterns of data parallelism and task parallelism, the graphics proces...
Distributed computing technologies allow a wide variety of tasks that use large amounts of data to b...
In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogene...
Scientific computations have been using GPU-enabled computers success-fully, often relying on distri...
Probabilistic Latent Semantic Analysis (PLSA) has been successfully applied to many text mining task...
In this age, a huge amount of data is generated every day by human interactions with services. Disco...
In this paper we introduce a MapReduce-based implementation of self-organizing maps that performs co...
Graph Pattern Mining (GPM) extracts higher-order information in a large graph by searching for small...
Many real-world applications are capable of producing continuous, infinite streams of data. During t...
We present two efficient Apriori implementations of Frequent Itemset Mining (FIM) that utilize new-g...
As Machine Learning (ML) applications are becoming ever more pervasive, fully-trained systems are ma...
As ML applications are becoming ever more pervasive, fully-trained systems are made increasingly ava...
We present GPMR, our MapReduce library that leverages the power of GPU clusters for large-scale comp...
AbstractData mining tools may be computationally demanding, so there is an increasing interest on pa...
Abstract—The exponential increase in the generation and collection of data has led us in a new era o...
Through the algorthmic design patterns of data parallelism and task parallelism, the graphics proces...
Distributed computing technologies allow a wide variety of tasks that use large amounts of data to b...
In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogene...