We introduce the SMTp architecture - an SMT processor augmented with a coherence protocol thread context, that together with a standard integrated memory controller can enable the design of (among other possibilities) scalable cache-coherent hardware distributed shared memory (DSM) machines from commodity nodes. We describe the minor changes needed to a conventional out-of-order multi-threaded core to realize SMTp, discussing issues related to both deadlock avoidance and performance. We then compare SMTp performance to that of various conventional DSM machines with normal SMT processors both with and without integrated memory controllers. On configurations from 1 to 32 nodes, with 1 to 4 application threads per node, we find that SMTp deliv...
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simul...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
The increasing hardware complexity of dynamically scheduled superscalar processors may compromise th...
We introduce the SMTp architecture - an SMT processor augmented with a coherence protocol thread con...
As computing power has increased over the past few decades, science and engineering have found more ...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlappin...
Simultaneous multithreading (SMT) is an architectural technique that allows for the parallel executi...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
Modem processors are designed to achieve greater amounts of instruction level parallelism (ILP) and ...
Simultaneous Multithreading (SMT) is proposed to improve pipeline throughput by overlapping executio...
A simultaneous multithreading (SMT) processor can issue instructions from several threads every cycl...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simul...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
The increasing hardware complexity of dynamically scheduled superscalar processors may compromise th...
We introduce the SMTp architecture - an SMT processor augmented with a coherence protocol thread con...
As computing power has increased over the past few decades, science and engineering have found more ...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
Previous work in scalable hardware distributed shared memory (DSM) multiprocessors has established t...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
Simultaneous Multithreading (SMT) has been proposed for improving processor throughput by overlappin...
Simultaneous multithreading (SMT) is an architectural technique that allows for the parallel executi...
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruc...
Modem processors are designed to achieve greater amounts of instruction level parallelism (ILP) and ...
Simultaneous Multithreading (SMT) is proposed to improve pipeline throughput by overlapping executio...
A simultaneous multithreading (SMT) processor can issue instructions from several threads every cycl...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simul...
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource c...
The increasing hardware complexity of dynamically scheduled superscalar processors may compromise th...