Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor system- on-chip. An external memory that is shared between processors is a bottleneck in current and future systems. Cache misses and a large cache miss penalty contribute to a low processor utilisation. In this paper, we describe a novel cache optimisation technique to reduce instruction and data cache misses for streaming applications. The instruction and data locality are improved by executing a task multiple times before moving to the next task. Furthermore, we introduce a dataflow model that is used to trade-off the number of cache misses against end-to-end latency and memory usage. For our industrial application, which is a Digital Rad...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
In this paper, we quantify the effect that fine grained multistreamed interaction of threads within ...
Abstract—On multicore processors, applications are run shar-ing the cache. This paper presents onlin...
Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor...
As processor speeds continue to increase, the memory bottleneck remains a primary impediment to atta...
In the world of complex SoCs for consumer applica-tions, multiprocessor architectures usually deploy...
This paper considers the problem of scheduling streaming applications on uniprocessors in order to m...
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Compute...
There are two competing models for the on-chip memory in Chip Multiprocessor (CMP) systems: hardware...
Of late, there has been a considerable interest in models, algorithms and method-ologies specificall...
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasing...
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of executio...
In this paper, we quantify the effect that fine grained multistreamed interaction of threads within ...
Projections of computer technology forecast proces-sors with peak performance of 1,000 MIPS in the r...
Abstract—The contribution of memory latency to execution time continues to increase, and latency hid...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
In this paper, we quantify the effect that fine grained multistreamed interaction of threads within ...
Abstract—On multicore processors, applications are run shar-ing the cache. This paper presents onlin...
Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor...
As processor speeds continue to increase, the memory bottleneck remains a primary impediment to atta...
In the world of complex SoCs for consumer applica-tions, multiprocessor architectures usually deploy...
This paper considers the problem of scheduling streaming applications on uniprocessors in order to m...
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Compute...
There are two competing models for the on-chip memory in Chip Multiprocessor (CMP) systems: hardware...
Of late, there has been a considerable interest in models, algorithms and method-ologies specificall...
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasing...
Coherent read misses in shared-memory multiprocessors account for a substantial fraction of executio...
In this paper, we quantify the effect that fine grained multistreamed interaction of threads within ...
Projections of computer technology forecast proces-sors with peak performance of 1,000 MIPS in the r...
Abstract—The contribution of memory latency to execution time continues to increase, and latency hid...
The speed of processors increases much faster than the memory access time. This makes memory accesse...
In this paper, we quantify the effect that fine grained multistreamed interaction of threads within ...
Abstract—On multicore processors, applications are run shar-ing the cache. This paper presents onlin...