The present work is an analysis of the performance of the basic vector operations AXPY, DOT and SpMV using OpenCL. The code was tested on the NVIDIA Tesla S2050 GPU and Intel Xeon Phi 3120A coprocessor. Due to the nature of the AXPY function, only two versions were implemented, the routine to be executed by the CPU and the kernel to be executed on the previously mentioned devices. It was studied how they perform for different vector’s sizes. Their results show the NVIDIA architecture better suited for the smaller vectors sizes and the Intel architecture for the larger vector’s sizes. For the DOT and SpMV functions, there are three versions implemented. The first is the CPU routine, the second one is an OpenCL kernel that uses local memory a...
One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM)....
OpenCL has been proposed as a means of accelerating functional computation using FPGA and GPU accele...
In our study, we present the results of the implementation of the SHA-512 algorithm in FPGAs. The di...
The present work is an analysis of the performance of the basic vector operations AXPY, DOT and SpMV...
Achieving optimal performance on the latest multi-core and many-core architectures depends more and ...
Abstract—The paper presents results of several experiments evaluating the performance of NVIDIA proc...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
In this work, we evaluate OpenCL as a programming tool for developing performance-portable applicati...
Abstract—GPUs have been successfully used for acceleration of many mathematical functions and librar...
Abstract—Augmenting a processor with special hardware that is able to apply a Single Instruction to ...
Accelerators such as the Graphic Processing Unit (GPU) have increasingly seen use by the science and...
GPU acceleration is the concept of accelerating the execution speed of an application by running it ...
Abstract—Augmenting a processor with special hardware that is able to apply a Single Instruction to ...
The application of accelerators in HPC applications has seen enormous growth in the last decade. In ...
In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Inte...
One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM)....
OpenCL has been proposed as a means of accelerating functional computation using FPGA and GPU accele...
In our study, we present the results of the implementation of the SHA-512 algorithm in FPGAs. The di...
The present work is an analysis of the performance of the basic vector operations AXPY, DOT and SpMV...
Achieving optimal performance on the latest multi-core and many-core architectures depends more and ...
Abstract—The paper presents results of several experiments evaluating the performance of NVIDIA proc...
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel...
In this work, we evaluate OpenCL as a programming tool for developing performance-portable applicati...
Abstract—GPUs have been successfully used for acceleration of many mathematical functions and librar...
Abstract—Augmenting a processor with special hardware that is able to apply a Single Instruction to ...
Accelerators such as the Graphic Processing Unit (GPU) have increasingly seen use by the science and...
GPU acceleration is the concept of accelerating the execution speed of an application by running it ...
Abstract—Augmenting a processor with special hardware that is able to apply a Single Instruction to ...
The application of accelerators in HPC applications has seen enormous growth in the last decade. In ...
In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Inte...
One of the key kernels in scientific applications is the Sparse Matrix Vector Multiplication (SMVM)....
OpenCL has been proposed as a means of accelerating functional computation using FPGA and GPU accele...
In our study, we present the results of the implementation of the SHA-512 algorithm in FPGAs. The di...