Reads and writes to global data in off-chip RAM can limit the performance achieved with HLS tools, as each access takes multiple cycles and usually blocks progress in the application state machine. This can be combated by using data prefetchers, which hide access time by predicting the next memory access and loading it into a cache before it's required. Unfortunately, current prefetchers are only useful for memory accesses with known regular patterns, such as walking arrays, and are ineffective for those that use irregular patterns over application-specific data structures. In this work, we demonstrate prefetchers that are tailor-made for applications, even if they have irregular memory accesses. This is achieved through program slicing, a ...
High-level synthesis (HLS) automatically transforms high-level programs in a language such as C/C++ ...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
In the last century great progress was achieved in developing processors with extremely high computa...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Runahead execution improves processor performance by accurately prefetching long-latency memory acce...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which c...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
pre-printMemory latency is a major factor in limiting CPU per- formance, and prefetching is a well-k...
Journal ArticleThe speed gap between processors and memory system is becoming the performance bottle...
The memory wall places a significant limit on performance for many modern workloads. These applicati...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
High-level synthesis (HLS) automatically transforms high-level programs in a language such as C/C++ ...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
In the last century great progress was achieved in developing processors with extremely high computa...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Runahead execution improves processor performance by accurately prefetching long-latency memory acce...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
Dynamically scheduled high-level synthesis (HLS) enables the use of load-store queues (LSQs) which c...
Modern processors and compilers hide long memory latencies through non-blocking loads or explicit so...
textModern computer systems spend a substantial fraction of their running time waiting for data from...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
pre-printMemory latency is a major factor in limiting CPU per- formance, and prefetching is a well-k...
Journal ArticleThe speed gap between processors and memory system is becoming the performance bottle...
The memory wall places a significant limit on performance for many modern workloads. These applicati...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
High-level synthesis (HLS) automatically transforms high-level programs in a language such as C/C++ ...
A well known performance bottleneck in computer architecture is the so-called memory wall. This term...
In the last century great progress was achieved in developing processors with extremely high computa...