Matrix Factorization (MF) has been widely applied in machine learning and data mining. Due to the large computational cost of MF, we aim to improve the efficiency of SGD-based MF computation by utilizing the massive parallel processing power of heterogeneous multiprocessors. The main challenge in parallel SGD algorithms on heterogeneous CPU-GPU systems lies in the strategy to assign tasks. We design a novel strategy to divide the matrix into a set of blocks by considering two aspects. First, we observe that the matrix should be divided nonuniformly, and relatively large blocks should be assigned to GPUs to saturate the computing power of GPUs. In addition to exploiting the characteristics of hardware, the workloads assigned to two types of ...
Abstract. The functional performance model (FPM) of heterogeneous proces-sors has proven to be more ...
Low-rank matrices arise in many scientific and engineering computations. Both computational and stor...
Background: Heterogeneous parallel computing systems utilize the combination of different resources ...
Most supercomputers are shipped with both a CPU and a GPU. With the powerful parallel computing capa...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
Matrix factorization is one of the fundamental techniques for analyzing latent relationship between ...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
Abstract. In this paper, we present a novel algorithm of optimal matrix partitioning for parallel de...
Background: In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained ...
Block-structured matrices arise in several contexts in circuit\ud simulation problems. These matrice...
The need for efficient and scalable big-data analytics methods is more essential than ever due to th...
Matrix factorization is known to be an effective method for recommender systems that are given only ...
There is an increased interest in building machine learning frameworks with advanced algebraic capab...
For many finite element problems, when represented as sparse matrices, iterative solvers are found t...
Hybrid GPU/CPU clusters are becoming very popular in the scientific computing community, as attested...
Abstract. The functional performance model (FPM) of heterogeneous proces-sors has proven to be more ...
Low-rank matrices arise in many scientific and engineering computations. Both computational and stor...
Background: Heterogeneous parallel computing systems utilize the combination of different resources ...
Most supercomputers are shipped with both a CPU and a GPU. With the powerful parallel computing capa...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
Matrix factorization is one of the fundamental techniques for analyzing latent relationship between ...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
Abstract. In this paper, we present a novel algorithm of optimal matrix partitioning for parallel de...
Background: In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained ...
Block-structured matrices arise in several contexts in circuit\ud simulation problems. These matrice...
The need for efficient and scalable big-data analytics methods is more essential than ever due to th...
Matrix factorization is known to be an effective method for recommender systems that are given only ...
There is an increased interest in building machine learning frameworks with advanced algebraic capab...
For many finite element problems, when represented as sparse matrices, iterative solvers are found t...
Hybrid GPU/CPU clusters are becoming very popular in the scientific computing community, as attested...
Abstract. The functional performance model (FPM) of heterogeneous proces-sors has proven to be more ...
Low-rank matrices arise in many scientific and engineering computations. Both computational and stor...
Background: Heterogeneous parallel computing systems utilize the combination of different resources ...