Recently, CPUs with an identical ISA tend to have different microarchitectures, different computation resources, and special instructions. To achieve efficient program execution on such hardware, compilers have machine-dependent code optimization. However, software vendors cannot adopt this optimization for software production, since the software would be widely distributed and therefore it must be executable on any machine with the same ISA. On the other hand, there is a significant gap between processorpsilas operational speed and memory access speed, and currently the gap is increasing. In this paper, we introduce several special prefetch instructions that are suited for memory access patterns that frequently appear in program execution....
Traditional software controlled data cache prefetching is often ineffective due to the lack of runti...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Binary recompilation and translation play an important role in computer systems today. It is used by...
Recently, reconfigurable architectures, which outperform DSP processors, have become important. Alth...
Software prefetching and locality optimizations are techniques for overcoming the gap between proces...
Abstract—Binary translation and dynamic optimization are widely used to provide compatibility betwee...
Software prefetching and locality optimizations are techniques for overcoming the gap between proc...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Dynamic binary translation is the process of translating and optimizing executable code for one mach...
Software prefetching and locality optimizations are techniques for overcoming the speed gap between ...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Software prefetching and locality optimizations are two techniques for overcoming the speed gap betw...
Dynamic binary translation is the process of translating and optimizing executable code for one mach...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Traditional software controlled data cache prefetching is often ineffective due to the lack of runti...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Binary recompilation and translation play an important role in computer systems today. It is used by...
Recently, reconfigurable architectures, which outperform DSP processors, have become important. Alth...
Software prefetching and locality optimizations are techniques for overcoming the gap between proces...
Abstract—Binary translation and dynamic optimization are widely used to provide compatibility betwee...
Software prefetching and locality optimizations are techniques for overcoming the gap between proc...
Despite rapid increases in CPU performance, the primary obstacles to achieving higher performance in...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Dynamic binary translation is the process of translating and optimizing executable code for one mach...
Software prefetching and locality optimizations are techniques for overcoming the speed gap between ...
A major performance limiter in modern processors is the long latencies caused by data cache misses. ...
Software prefetching and locality optimizations are two techniques for overcoming the speed gap betw...
Dynamic binary translation is the process of translating and optimizing executable code for one mach...
Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting propositi...
Traditional software controlled data cache prefetching is often ineffective due to the lack of runti...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...
Binary recompilation and translation play an important role in computer systems today. It is used by...