While parallel computer architectures have become mainstream, application development on them is still challenging. There is a need for new tools, languages and programming models. Additionally, there is a lack of knowledge about the performance of parallel approaches of basic but important operations, such as the QR decomposition of a matrix, on current commercial manycore architectures. This paper evaluates a high level dataflow language (CAL), a source-to-source compiler (Cal2Many) and three QR decomposition algorithms (Givens Rotations, Householder and Gram-Schmidt). The algorithms are implemented both in CAL and hand-optimized C languages, executed on Adapteva's Epiphany manycore architecture and evaluated with respect to performance, ...
Abstract—A systolic array provides an alternative comput-ing paradigm to the von Neuman architecture...
Interprocessor communication often dominates the runtime of large matrix computations. We present a ...
International audienceInterprocessor communication often dominates the runtime of large matrix compu...
While parallel computer architectures have become mainstream, application development on them is sti...
Abstract Because of an imbalance between computation and memory speed in modern processors, programm...
The arrival of manycore systems enforces new approaches for developing applications in order to expl...
Dataflow programming is emerging as a promising technology for programming of parallel systems, such...
This paper introduces a new parallel QR decomposition algorithm. The novel load balancing method des...
Dataflow programming is emerging as a promising technology for programming of parallel systems, such...
Demand for more system functionality and computational resources are increasing exponentially. Avail...
Demand for more system functionality and computational resources are increasing exponentially. Avail...
. The parallel computer CM-200 consists of a very large number of simple processors connected in a m...
International audienceThe advent of multicore processors represents a disruptive event in the histor...
iii Dataflow programming is emerging as a promising technology for program-ming of parallel systems,...
Abstract Dataflow programming has received increasing attention in the age of multicore and heterog...
Abstract—A systolic array provides an alternative comput-ing paradigm to the von Neuman architecture...
Interprocessor communication often dominates the runtime of large matrix computations. We present a ...
International audienceInterprocessor communication often dominates the runtime of large matrix compu...
While parallel computer architectures have become mainstream, application development on them is sti...
Abstract Because of an imbalance between computation and memory speed in modern processors, programm...
The arrival of manycore systems enforces new approaches for developing applications in order to expl...
Dataflow programming is emerging as a promising technology for programming of parallel systems, such...
This paper introduces a new parallel QR decomposition algorithm. The novel load balancing method des...
Dataflow programming is emerging as a promising technology for programming of parallel systems, such...
Demand for more system functionality and computational resources are increasing exponentially. Avail...
Demand for more system functionality and computational resources are increasing exponentially. Avail...
. The parallel computer CM-200 consists of a very large number of simple processors connected in a m...
International audienceThe advent of multicore processors represents a disruptive event in the histor...
iii Dataflow programming is emerging as a promising technology for program-ming of parallel systems,...
Abstract Dataflow programming has received increasing attention in the age of multicore and heterog...
Abstract—A systolic array provides an alternative comput-ing paradigm to the von Neuman architecture...
Interprocessor communication often dominates the runtime of large matrix computations. We present a ...
International audienceInterprocessor communication often dominates the runtime of large matrix compu...