Parallel programming has been available for a few decades using clusters of computers (sets of interconnected computers with shared memory and distributed memory); recently, it has been available using multicore CPUs and GPUs (Graphics Processing Unit). Parallel programming has been very useful in applications of science and engineering to reduce the sequential execution time by parallel numerical libraries for clusters, such as PBLAS and ScaLAPACK, which rely on the sequential numerical libraries BLAS and LAPACK. Parallel numerical libraries have been developed for GPUs, as CUBLAS and CULA (based on BLAS and LAPACK), developed in the CUDA programming platform, developed by NVIDIA. CUDA tries to exploit the GPUs potential. This report...