This thesis describes a high-performance implementation technique for Multilisp's "future" parallelism construct. This method addresses the non-uniform memory access (NUMA) problem inherent in large scale shared-memory multiprocessors. The technique is based on lazy task creation (LTC), a dynamic task partitioning mechanism that dramatically reduces the cost of task creation and consequently makes it possible to exploit fine grain parallelism. In LTC, idle processors get work to do by "stealing" tasks from other processors. A previously proposed implementation of LTC is the shared-memory (SM) protocol. The main disadvantage of the SM protocol is that it requires the stack to be cached suboptimally on cache-incoheren...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
A future is a simple abstraction mechanism for exposing potential concurrency in programs. In this p...
Many parallel algorithms are naturally expressed at a fine level of granularity, often finer than a ...
The arrival multi-core processors or chip multiprocessors (CMP) operated with symmetrical multiproce...
The currently dominant programming models to write software for multicore processors use threads tha...
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained...
his paper addresses the problem of universal synchronization primitives that can support scalable th...
112 pagesSince the end of Dennard’s scaling, computer architects have fully embraced parallelism to ...
It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP)...
Constructing high performance computing system and providing easy-to-use programming model for users...
This paper reviews some important issues for scalability\ud in programming and future trend with man...
A software distributed shared memory (DSM) system allows shared memory parallel programs to execute ...
We are currently investigating two differentapproaches to scalable shared memory: Munin, a distribut...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
A future is a simple abstraction mechanism for exposing potential concurrency in programs. In this p...
Many parallel algorithms are naturally expressed at a fine level of granularity, often finer than a ...
The arrival multi-core processors or chip multiprocessors (CMP) operated with symmetrical multiproce...
The currently dominant programming models to write software for multicore processors use threads tha...
We introduce explicit multi-threading (XMT), a decentralized architecture that exploits fine-grained...
his paper addresses the problem of universal synchronization primitives that can support scalable th...
112 pagesSince the end of Dennard’s scaling, computer architects have fully embraced parallelism to ...
It is possible to implement the parallel random access machine (PRAM) on a chip multiprocessor (CMP)...
Constructing high performance computing system and providing easy-to-use programming model for users...
This paper reviews some important issues for scalability\ud in programming and future trend with man...
A software distributed shared memory (DSM) system allows shared memory parallel programs to execute ...
We are currently investigating two differentapproaches to scalable shared memory: Munin, a distribut...
International audienceDynamic task-parallel programming models are popular on shared-memory systems,...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...