Data parallel operations are widely used in game, multimedia, physics and data-intensive and scientific applications. Unlike control parallelism, data parallelism comes from simultaneous operations across large sets of collection-oriented data such as vectors and matrices. A simple implementation can use OpenMP directives to execute operations on multiple data concurrently. However, this implementation introduces a lot of barriers across data parallel operations and even within a single data parallel operation to synchronize the concurrent threads. This synchronization cost may overwhelm the benefit of data parallelism. Moreover, barriers prohibit many optimization opportunities among parallel regions. In this paper, we describe an approach...
International audienceThe decrease of the performance gain dictated by Moore's Law boosted the devel...
Poor scalability on parallel architectures can be attributed to several factors, among which idle ti...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
146 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.My work discusses various str...
International audienceWe introduce shared-memory parallelism in a parallel distributed-memory solver...
The performance of a High Performance Parallel or Distributed Computation depends heavily on minimiz...
With ubiquitous multi-core architectures, a major challenge is how to effectively use these machines...
International audienceWe discuss efficient shared memory parallelization of sparse matrix computatio...
As the microprocessor industry embraces multicore architectures, inherently parallel applications be...
Research on programming distributed memory multiprocessors has resulted in a well-understood program...
Multicore and many-core architectures have penetrated the vast majority of computing systems, from h...
This paper advances the state-of-the-art in programming models for exploiting task-level parallelism...
Distributed Memory Multicomputers (DMMs) such as the IBM SP-2, the Intel Paragon and the Thinking Ma...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
International audienceThe decrease of the performance gain dictated by Moore's Law boosted the devel...
Poor scalability on parallel architectures can be attributed to several factors, among which idle ti...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
146 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2008.My work discusses various str...
International audienceWe introduce shared-memory parallelism in a parallel distributed-memory solver...
The performance of a High Performance Parallel or Distributed Computation depends heavily on minimiz...
With ubiquitous multi-core architectures, a major challenge is how to effectively use these machines...
International audienceWe discuss efficient shared memory parallelization of sparse matrix computatio...
As the microprocessor industry embraces multicore architectures, inherently parallel applications be...
Research on programming distributed memory multiprocessors has resulted in a well-understood program...
Multicore and many-core architectures have penetrated the vast majority of computing systems, from h...
This paper advances the state-of-the-art in programming models for exploiting task-level parallelism...
Distributed Memory Multicomputers (DMMs) such as the IBM SP-2, the Intel Paragon and the Thinking Ma...
While the growing number of cores per chip allows researchers to solve larger scientific and enginee...
International audienceThe decrease of the performance gain dictated by Moore's Law boosted the devel...
Poor scalability on parallel architectures can be attributed to several factors, among which idle ti...
We investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using Ope...