A major challenge in fine-grained computing is achieving locality without excessive scheduling overhead. We built two J-Machine implementations of a fine-grained programming model, the Berkeley Threaded Abstract Machine. One implementation takes an Active Messages approach, maintaining a scheduling hierarchy in software in order to improve data cache performance. Another approach relies on the J-Machine’s message queues and fast task switch, lowering the control costs at the expense of data locality. Our analysis measures the costs and benefits of each approach, for a variety of programs and cache configurations. The Active Messages implementation is strongest when miss penalties are high and for the finest-grained programs. The hardware-bu...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
The memory system is the key to performance in contemporary computer systems. When designing a new m...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
On recent high-performance multiprocessors, there is a potential conflict between the goals of achie...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Given the large communication overheads characteristic of modern parallel machines, optimizations th...
Safe languages provide programming abstractions, like type and memory safety, to improve programmer ...
Commercial link : http://www.springerlink.de/ ALCHEMY/http://www.springer.comCache memories were inv...
This thesis presents a systematic study of two modes of program execution: synchronous and asynchron...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
Manual memory management is error prone. Some of the errors it causes, in particular memory leaks an...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
The memory system is the key to performance in contemporary computer systems. When designing a new m...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...
This paper describes a method to improve the cache locality of sequential programs by scheduling fin...
On recent high-performance multiprocessors, there is a potential conflict between the goals of achie...
In the past decade, processor speed has become signicantly faster than memory speed. Small, fast cac...
© 1994 ACM. In the past decade, processor speed has become significantly faster than memory speed. S...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
Given the large communication overheads characteristic of modern parallel machines, optimizations th...
Safe languages provide programming abstractions, like type and memory safety, to improve programmer ...
Commercial link : http://www.springerlink.de/ ALCHEMY/http://www.springer.comCache memories were inv...
This thesis presents a systematic study of two modes of program execution: synchronous and asynchron...
Lightweight threads have become a common abstraction in the field of programming languages and opera...
Manual memory management is error prone. Some of the errors it causes, in particular memory leaks an...
It is often assumed that computational load balance cannot be achieved in parallel and distributed s...
Data locality is central to modern computer designs. The widening gap between processor speed and me...
The memory system is the key to performance in contemporary computer systems. When designing a new m...