Abstract. One of the major drawbacks of computing with graphics adapters is the limited available memory for relevant problem sizes. To overcome this limitation for the ViennaCL library, we investigate a partitioning approach for one of the standard benchmark problems in High-Performance Computing (HPC), namely the dense matrix-matrix product. We apply this partitioning approach to problems exceeding the available memory on graphics adapters. Moreover, we investigate the applicability on distributed memory systems by facilitating the Message Passing Interface (MPI). Our approach is presented in detail and bench-mark results are given.
The proliferation of high performance workstations and the emergence of high speed networks have att...
This paper presents and analyzes two different strategies of heterogeneous distribution of computati...
2012 IEEE 26th Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), Shang...
The problem of partitioning dense matrices into sets of sub-matrices has received increased attentio...
We present a distributed-memory library for computations with dense structured matrices. A matrix is...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
Abstract. The functional performance model (FPM) of heterogeneous proces-sors has proven to be more ...
Abstract. In this paper, we present a novel algorithm of optimal matrix partitioning for parallel de...
In this paper, we consider the problem of partitioning a square into a set of zones of prescribed ar...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous mast...
International audienceThis paper is focused on designing efficient parallel matrix-product algorithm...
The current state and foreseeable future of high performance scientific computing (HPC) can be descr...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
The proliferation of high performance workstations and the emergence of high speed networks have att...
This paper presents and analyzes two different strategies of heterogeneous distribution of computati...
2012 IEEE 26th Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), Shang...
The problem of partitioning dense matrices into sets of sub-matrices has received increased attentio...
We present a distributed-memory library for computations with dense structured matrices. A matrix is...
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and mul...
Abstract. The functional performance model (FPM) of heterogeneous proces-sors has proven to be more ...
Abstract. In this paper, we present a novel algorithm of optimal matrix partitioning for parallel de...
In this paper, we consider the problem of partitioning a square into a set of zones of prescribed ar...
AbstractÐIn this paper, we address the issue of implementing matrix multiplication on heterogeneous ...
Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster 2006), October, 2...
This paper is aimed at designing efficient parallel matrix-product algorithms for heterogeneous mast...
International audienceThis paper is focused on designing efficient parallel matrix-product algorithm...
The current state and foreseeable future of high performance scientific computing (HPC) can be descr...
(eng) In this paper, we address the issue of implementing matrix-matrix multiplication on heterogene...
The proliferation of high performance workstations and the emergence of high speed networks have att...
This paper presents and analyzes two different strategies of heterogeneous distribution of computati...
2012 IEEE 26th Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), Shang...