Data prefetching via helper threading has been extensively investigated on Simultaneous Multi-Threading (SMT) or Virtual Multi-Threading (VMT) architectures. Although reportedly large cache latency can be hidden by helper threads at runtime, most techniques rely on hardware support to reduce context switch overhead between the main thread and helper thread as well as rely on static profile feedback to construct the help thread code. This paper develops a new solution by exploiting helper threaded pre-fetching through dynamic optimization on the latest UltraSPARC Chip-Multiprocessing (CMP) processor. Our experiments show that by utilizing the otherwise idle processor core, a single user-level helper thread is sufficient to improve the runtim...
Pre-execution is a novel latency-tolerance technique where one or more helper threads run in front o...
Abstract. This paper focuses on the instruction fetch resources in a real-time SMT processor to prov...
Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a signi...
International audience—Heterogeneous Many Cores (HMC) architectures that mix many simple/small cores...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
Multicore processors have become ubiquitous in today's computing platforms, extending from smartphon...
Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlappin...
Abstract---In data intensive applications of Cloud Computing such as XML parsing, large graph traver...
This survey covers the general idea behind helper threads and the major ways in which they are imple...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
Technology scaling trends have forced designers to consider alternatives to deeply pipelining aggres...
Hardly predictable data addresses in man), irregular applica-tions have rendered prefetching ineffec...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
Heterogeneous Many Cores (HMC) architectures that mix many simple/small cores with a few complex/lar...
Simultaneous multithreading (SMT) allows multiple hardware threads to execute concurrently on a proc...
Pre-execution is a novel latency-tolerance technique where one or more helper threads run in front o...
Abstract. This paper focuses on the instruction fetch resources in a real-time SMT processor to prov...
Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a signi...
International audience—Heterogeneous Many Cores (HMC) architectures that mix many simple/small cores...
Scaling the performance of applications with little thread-level parallelism is one of the most seri...
Multicore processors have become ubiquitous in today's computing platforms, extending from smartphon...
Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlappin...
Abstract---In data intensive applications of Cloud Computing such as XML parsing, large graph traver...
This survey covers the general idea behind helper threads and the major ways in which they are imple...
This paper describes future execution (FE), a simple hardware-only technique to accelerate indi-vidu...
Technology scaling trends have forced designers to consider alternatives to deeply pipelining aggres...
Hardly predictable data addresses in man), irregular applica-tions have rendered prefetching ineffec...
Abstract—A single parallel application running on a multi-core system shows sub-linear speedup becau...
Heterogeneous Many Cores (HMC) architectures that mix many simple/small cores with a few complex/lar...
Simultaneous multithreading (SMT) allows multiple hardware threads to execute concurrently on a proc...
Pre-execution is a novel latency-tolerance technique where one or more helper threads run in front o...
Abstract. This paper focuses on the instruction fetch resources in a real-time SMT processor to prov...
Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a signi...