Current work in Simultaneous Multithreading provides little benefit to programs that aren't partitioned into threads. We propose Simultaneous Subordinate Microthreading (SSMT) to correct this by spawning subordinate threads that perform optimizations on behalf of the single primary thread. These threads, written in microcode, are issued and executed concurrently with the primary thread. They directly manipulate the microarchitecture to improve the primary thread's branch prediction accuracy, cache hit rate, and prefetch effectiveness. All contribute to the performance of the primary thread. This paper introduces SSMT and discusses its potential to increase performance. We illustrate its usefulness with an SSMT machine that execute...
In this paper, we examined the behavior of three of the best performing branch prediction strategies...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
Abstract. Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-p...
Current work in Simultaneous Multithreading pro-vides little benefit to programs that aren’t partiti...
Current work in Simultaneous Multithreading pro-vides little benefit to programs that aren’t partiti...
Tomorrow's ultra-wide microprocessors will be unable to supply enough work from single-threaded prog...
Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlappin...
Integration of multiple processor cores on a single die, relatively constant die sizes, increasing m...
Simultaneous Multithreading (SMT) is proposed to improve pipeline throughput by overlapping executio...
capable of executing instructions from multiple threads in the same cycle. SMT in fact was introduce...
Compiler optimizations are often driven by specific assumptions about the underlying architecture an...
Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction ...
Compiler optimizations are often driven by specific assumptions about the underlying architecture an...
This paper examines simultaneous multithreading, a technique per-mitting several independent threads...
This paper proposes a dynamic cache partitioning method for simultaneous multi-threading systems. Un...
In this paper, we examined the behavior of three of the best performing branch prediction strategies...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
Abstract. Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-p...
Current work in Simultaneous Multithreading pro-vides little benefit to programs that aren’t partiti...
Current work in Simultaneous Multithreading pro-vides little benefit to programs that aren’t partiti...
Tomorrow's ultra-wide microprocessors will be unable to supply enough work from single-threaded prog...
Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlappin...
Integration of multiple processor cores on a single die, relatively constant die sizes, increasing m...
Simultaneous Multithreading (SMT) is proposed to improve pipeline throughput by overlapping executio...
capable of executing instructions from multiple threads in the same cycle. SMT in fact was introduce...
Compiler optimizations are often driven by specific assumptions about the underlying architecture an...
Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction ...
Compiler optimizations are often driven by specific assumptions about the underlying architecture an...
This paper examines simultaneous multithreading, a technique per-mitting several independent threads...
This paper proposes a dynamic cache partitioning method for simultaneous multi-threading systems. Un...
In this paper, we examined the behavior of three of the best performing branch prediction strategies...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
Abstract. Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-p...