Abstract—On many-core processors that do not provide hard-ware cache coherence, using shared memory in parallel com-putations is challenging. Reverting to pure message passing would avoid consistency issues, but replicating large shared datasets by messages is less efficient than accessing them di-rectly through shared memory. The TACO-MESH framework provides lightweight remote method calls and shared objects with software-managed consistency. This paper presents experience from porting a graph partitioning algorithm to the framework. A performance evaluation on the experimental Intel SCC pro-cessor, which has no hardware cache coherence, shows that parallelization can be efficient despite the overhead of software-level consistency manageme...
We describe an efficient software cache consistency mechanism for shared memory multiprocessors that...
With the advancement of design and fabrication of high-performance integrated circuits technology, i...
Programming abstractions to simplify distributed parallel computing have been widely adopted. Yet, i...
Abstract. Parallel graph reduction is a model for parallel program exe-cution in which shared-memory...
Parallel graph reduction is a conceptually simple model for the concurrent evaluation of lazy functi...
Algorithms operating on a graph setting are known to be highly irregular and un- structured. This le...
Abstract. Parallel functional programs based on the graph reduction execution model display consider...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
Single chip multicore processors are now prevalent and processors with hundreds of cores are being p...
The transition from single processor to shared memory multi-processors (or shared memory multi-core ...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
The Single-chip Cloud Computer (SCC) is an experimental multicore processor created by Intel Labs fo...
During the last few years many different memory consistency protocols have been proposed. These rang...
Shared-memory multiprocessor systems can achieve high performance levels when appropriate work paral...
We describe an efficient software cache consistency mechanism for shared memory multiprocessors that...
With the advancement of design and fabrication of high-performance integrated circuits technology, i...
Programming abstractions to simplify distributed parallel computing have been widely adopted. Yet, i...
Abstract. Parallel graph reduction is a model for parallel program exe-cution in which shared-memory...
Parallel graph reduction is a conceptually simple model for the concurrent evaluation of lazy functi...
Algorithms operating on a graph setting are known to be highly irregular and un- structured. This le...
Abstract. Parallel functional programs based on the graph reduction execution model display consider...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
Single chip multicore processors are now prevalent and processors with hundreds of cores are being p...
The transition from single processor to shared memory multi-processors (or shared memory multi-core ...
Reordering instructions and data layout can bring significant performance improvement for memory bou...
A wide variety of computer architectures have been proposed to exploit parallelism at different gran...
The Single-chip Cloud Computer (SCC) is an experimental multicore processor created by Intel Labs fo...
During the last few years many different memory consistency protocols have been proposed. These rang...
Shared-memory multiprocessor systems can achieve high performance levels when appropriate work paral...
We describe an efficient software cache consistency mechanism for shared memory multiprocessors that...
With the advancement of design and fabrication of high-performance integrated circuits technology, i...
Programming abstractions to simplify distributed parallel computing have been widely adopted. Yet, i...