The Gram-Schmidt method is a classical method for determining QR decompositions, which is commonly used in many applications in computational physics, such as orthogonalization of quantum mechanical operators or Lyapunov stability analysis. In this paper, we discuss how well the Gram-Schmidt method performs on different hardware architectures, including both state-of-the-art GPUs and CPUs. We explain, in detail, how a smart interplay between hardware and software can be used to speed up those rather compute intensive applications as well as the benefits and disadvantages of several approaches. In addition, we compare some highly optimized standard routines of the BLAS libraries against our own optimized routines on both processor types. Par...
In this paper we compare GPU-based implementations of three metaheuristics: Particle Swarm Optimizat...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Multiple independent runs of an evolutionary algorithm in parallel are often used to increase the ef...
The Gram-Schmidt method is a classical method for determining QR decompositions, which is commonly u...
High-performance computing is one of the most demanding technologies in today\u27s computational wor...
The paper introduces an optimized multicore CPU implementation of the genetic algorithm and compares...
As Central Processing Units (CPUs) and Graphical Processing Units (GPUs) get progressively better, d...
<p>Combined execution times in s for serial and parallel implementations of k-nearest neighbor and r...
This work reviews the experience of implementing different versions of the SSPR rank-one update oper...
Nowadays a technique of using graphics processing units (GPUs) for general-purpose computing (or GPG...
In the past decade, FPGAs and GPUs have become increasingly common as hardware accelerators when dea...
Graphics Processing Units (GPU) are increasingly being used for general-purpose programming, instead...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The high performance computing community has traditionally focused uniquely on the reduction of exec...
In this paper we compare GPU-based implementations of three metaheuristics: Particle Swarm Optimizat...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Multiple independent runs of an evolutionary algorithm in parallel are often used to increase the ef...
The Gram-Schmidt method is a classical method for determining QR decompositions, which is commonly u...
High-performance computing is one of the most demanding technologies in today\u27s computational wor...
The paper introduces an optimized multicore CPU implementation of the genetic algorithm and compares...
As Central Processing Units (CPUs) and Graphical Processing Units (GPUs) get progressively better, d...
<p>Combined execution times in s for serial and parallel implementations of k-nearest neighbor and r...
This work reviews the experience of implementing different versions of the SSPR rank-one update oper...
Nowadays a technique of using graphics processing units (GPUs) for general-purpose computing (or GPG...
In the past decade, FPGAs and GPUs have become increasingly common as hardware accelerators when dea...
Graphics Processing Units (GPU) are increasingly being used for general-purpose programming, instead...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The high performance computing community has traditionally focused uniquely on the reduction of exec...
In this paper we compare GPU-based implementations of three metaheuristics: Particle Swarm Optimizat...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Multiple independent runs of an evolutionary algorithm in parallel are often used to increase the ef...