The emergence of multicore and heterogeneous architectures requires many linear algebra algorithms to be redesigned to take advantage of the accelerators, such as GPUs. A particularly challenging class of problems, arising in numerous applications, involves the use of linear algebra operations on many small-sized matrices. The size of these matrices is usually the same, up to a few hundred. The number of them can be thousands, even millions. Compared to large matrix problems with more data parallel computation that are well suited on GPUs, the challenges of small matrix problems lie in the low computing intensity, the large sequential operation fractions, and the big PCI-E overhead. These challenges entail redesigning the algorithms instead...
A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applications is ...
The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (...
Modern graphics processing units (GPUs) have been at the leading edge of in-creasing chip-level para...
A challenging class of problems arising in many GPU applications, called batched problems, involves ...
textIn the past, we could rely on technology scaling and new micro-architectural techniques to impro...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
AbstractSolving a large number of relatively small linear systems has recently drawn more attention ...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
Abstract. If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced ...
If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced with accel...
This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficie...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
Abstract Optimization algorithms are becoming increasingly more important in many areas, such as fin...
A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applications is ...
The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (...
Modern graphics processing units (GPUs) have been at the leading edge of in-creasing chip-level para...
A challenging class of problems arising in many GPU applications, called batched problems, involves ...
textIn the past, we could rely on technology scaling and new micro-architectural techniques to impro...
A current trend in high-performance computing is to decompose a large linear algebra problem into ba...
AbstractSolving a large number of relatively small linear systems has recently drawn more attention ...
Dense linear algebra(DLA) is one of the most seven important kernels in high performance computing. ...
Abstract. Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major buildin...
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
Abstract. If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced ...
If multicore is a disruptive technology, try to imagine hybrid multicore systems enhanced with accel...
This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficie...
AbstractOne-sided dense matrix factorizations are important computational kernels in many scientific...
Abstract Optimization algorithms are becoming increasingly more important in many areas, such as fin...
A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applications is ...
The high performance computing (HPC) community is obsessed over the general matrix-matrix multiply (...
Modern graphics processing units (GPUs) have been at the leading edge of in-creasing chip-level para...