Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One way to cope with this problem is to use data forwarding to overlap memory accesses with computation. With data forwarding, when a processor produces a datum, in addition to updating its cache, it sends a copy of the datum to the caches of the processors that the compiler identified as consumers of it. As a result, when the consumer processors access the datum, they find it in their caches. This paper addresses two main issues. First, it presents a framework for a compiler algorithm for forwarding. Second, using address traces, it evaluates the performance impact of different levels of support for forwarding. Our simulations of a 32-processor ma...
International audienceThe dataflow paradigm frees the designer to focus on the functionality of an a...
Abstract. This paper proposes a mechanism for reducing the complexity of forwarding hardware in VLIW...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One wa...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
Memory forwarding is an effective way to dynamically optimize the data layout. It provides a safe wa...
Task-based programming models are increasingly being adopted due to their ability to express paralle...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
By optimizing data layout at run-time, we can potentially en-hance the performance of caches by acti...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
In this paper, we discuss access forwarding schemes for the replication that achieve balanced access...
Large-scale multiprocessors suffer from long latencies for remote accesses. Caching is by far the mo...
Shared memory systems generally support consumerinitiated communication; when a process needs data,...
This paper discusses some of the issues involved in implementing a shared-address space programming ...
International audienceThe dataflow paradigm frees the designer to focus on the functionality of an a...
Abstract. This paper proposes a mechanism for reducing the complexity of forwarding hardware in VLIW...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...
Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One wa...
Abstract As the difference in speed between processor and memory system continues to increase, it is...
This dissertation considers the use of data prefetching and an alternative mechanism, data forwardin...
Memory forwarding is an effective way to dynamically optimize the data layout. It provides a safe wa...
Task-based programming models are increasingly being adopted due to their ability to express paralle...
The transition to multi-core architectures can be attributed mainly to fundamental limitations in cl...
By optimizing data layout at run-time, we can potentially en-hance the performance of caches by acti...
Multiprocessors with shared memory are considered more general and easier to program than message-pa...
In this paper, we discuss access forwarding schemes for the replication that achieve balanced access...
Large-scale multiprocessors suffer from long latencies for remote accesses. Caching is by far the mo...
Shared memory systems generally support consumerinitiated communication; when a process needs data,...
This paper discusses some of the issues involved in implementing a shared-address space programming ...
International audienceThe dataflow paradigm frees the designer to focus on the functionality of an a...
Abstract. This paper proposes a mechanism for reducing the complexity of forwarding hardware in VLIW...
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and ...