The Decoupled Access/Execute(DAE) approach is a method to reduce the energy consumption of task-based programs, based on dividing tasks in two phases where the first phase prefetches data at a low CPU frequency and the following phase performs computation at a high CPU frequency. The goal of this project is to extend this approach to sequential programs and examine the benefits of optimising the access phase to better suit the architecture the program runs on, and the program input. By using a Just-In-Time compiler to dynamically optimise the program and by utilising profiling tools to obtain runtime information, we have examined the possible benefits of the DAE approach on sequential programs with optimised access phases. We compared the b...
Recently, CPUs with an identical ISA tend to have different microarchitectures, different computatio...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Energy efficiency is one of the biggest challenges in modern computer architecture. Increased perfor...
Computer architecture design faces an era of great challenges in an attempt to simultaneously improv...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Decoupled Access-Execute(DAE) is an innovative approach to optimize energy consumption of computer p...
Software level optimization for compilers has become a major research field.Dynamic Voltage Frequenc...
Energy efficiency is becoming a highly significant topic regarding modern hardware. The need for dec...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
The large latency of memory accesses in modern computer systems is a key obstacle to achieving high ...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
There has been intensive research on data prefetching focusing on performance improvement, however, ...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
Recently, CPUs with an identical ISA tend to have different microarchitectures, different computatio...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Energy efficiency is one of the biggest challenges in modern computer architecture. Increased perfor...
Computer architecture design faces an era of great challenges in an attempt to simultaneously improv...
Processor performance has increased far faster than memories have been able to keep up with, forcing...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Decoupled Access-Execute(DAE) is an innovative approach to optimize energy consumption of computer p...
Software level optimization for compilers has become a major research field.Dynamic Voltage Frequenc...
Energy efficiency is becoming a highly significant topic regarding modern hardware. The need for dec...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
The large latency of memory accesses in modern computer systems is a key obstacle to achieving high ...
As the gap between processor and memory speeds widens, program performance is increasingly dependent...
There has been intensive research on data prefetching focusing on performance improvement, however, ...
Abstract. Given the increasing gap between processors and memory, prefetching data into cache become...
Recently, CPUs with an identical ISA tend to have different microarchitectures, different computatio...
Modern processors apply sophisticated techniques, such as deep cache hierarchies and hardware prefet...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...