Abstract: Simultaneous multithreaded (SMT) processors use data caches which are dynamically shared between threads. Depending on the processor workload, sharing the data cache may harm performance due to excessive cache conflicts. A way to overcome this problem is to physically partition the cache between threads. Unfortunately, partitioning the cache requires additional hardware and may lead to lower utilisation of the cache in certain workloads. It is therefore important to consider software mechanisms to implicitly partition the cache between threads by controlling the locations in the cache in which each thread can load data. This paper proposes standard program transformations for partitioning the shared data caches of SMT processors, ...
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected a...
Once the cache memory was introduced in computer systems, the well-known gap in speeds between the m...
Multithreaded architectures context switch to another instruction stream to hide the latency of memo...
Abstract. Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-p...
This paper proposes a dynamic cache partitioning method for simultaneous multi-threading systems. Un...
Abstract—Multi-threaded applications execute their threads on different cores with their own local c...
At the level of multi-core processors that share the same cache, data sharing among threads which be...
A modern high-performance multi-core processor has large shared cache memories. However, simultaneou...
Simultaneous Multithreading (SMT) has emerged as an effective method of increasing utilization of re...
Abstract—Resizable caches can trade-off capacity for ac-cess speed to dynamically match the needs of...
This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads c...
The limitation imposed by instruction-level parallelism (ILP) has motivated the use of thread-level ...
In the multithread and multicore era, programs are forced to share part of the processor structures....
The limitation imposed by instruction-level parallelism (ILP) has motivated the use of thread-level ...
This thesis answers the question whether a scheduler needs to take into account where communicating...
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected a...
Once the cache memory was introduced in computer systems, the well-known gap in speeds between the m...
Multithreaded architectures context switch to another instruction stream to hide the latency of memo...
Abstract. Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-p...
This paper proposes a dynamic cache partitioning method for simultaneous multi-threading systems. Un...
Abstract—Multi-threaded applications execute their threads on different cores with their own local c...
At the level of multi-core processors that share the same cache, data sharing among threads which be...
A modern high-performance multi-core processor has large shared cache memories. However, simultaneou...
Simultaneous Multithreading (SMT) has emerged as an effective method of increasing utilization of re...
Abstract—Resizable caches can trade-off capacity for ac-cess speed to dynamically match the needs of...
This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads c...
The limitation imposed by instruction-level parallelism (ILP) has motivated the use of thread-level ...
In the multithread and multicore era, programs are forced to share part of the processor structures....
The limitation imposed by instruction-level parallelism (ILP) has motivated the use of thread-level ...
This thesis answers the question whether a scheduler needs to take into account where communicating...
In today's multi-core systems, cache contention due to true and false sharing can cause unexpected a...
Once the cache memory was introduced in computer systems, the well-known gap in speeds between the m...
Multithreaded architectures context switch to another instruction stream to hide the latency of memo...