During the first decade of the twenty-first century, the advent of multicore processing reached its maturity level, with the help of shared-memory programming models such as OpenMP [1], that allows to parallelize both legacy and new C and Fortran applications in a shared-memory environments. Meanwhile, message-passing programming models such as MPI [2] allowed to aggregate multicore systems in larger clusters, which dominated the TOP 500 supercomputing list [3]. However, at that time parallel computing seemed to face some limits that were hard to overcome. Physical limits prevented clock frequencies to increase, and the Law of Diminishing Returns reduced the usefulness of keep adding cores to a multiprocessor. Suddenly, the advent of GPU co...