Manycore processors, with tens to hundreds of tiny cores but no hardware-based cache coherence, can offer tremendous peak throughput on highly parallel programs while being complexity and energy efficient. Manycore processors can be combined with a few high-performance big cores for executing operating systems, legacy code, and serial regions. These systems use heterogeneous cache coherence (HCC) with hardware-based cache coherence between big cores and software-centric cache coherence between tiny cores. Unfortunately, programming these heterogeneous cache-coherent systems to enable collaborative execution is challenging, especially when considering dynamic task parallelism. This paper seeks to address this challenge using a combination of...
In embedded system-on-a-chip (SoC) applications, the demand for integrating heterogeneous processors...
Computing workloads often contain a mix of interac-tive, latency-sensitive foreground applications a...
This paper considers a large scale, cache-based multiprocessor that is interconnected by a hierarchi...
Manycore processors, with tens to hundreds of tiny cores but no hardware-based cache coherence, can ...
112 pagesSince the end of Dennard’s scaling, computer architects have fully embraced parallelism to ...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our appro...
Emerging task-based parallel programming models shield programmers from the daunting task of paralle...
Computational task DAGs are executed on parallel computers by a task scheduling algorithm. Intellige...
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a ch...
International audienceIn a parallel computing context, peak performance is hard to reach with irregu...
Computing workloads often contain a mix of interactive, latency-sensitive foreground applications an...
Emerging task-based parallel programming models shield programmers from the daunting task of paralle...
With increasing core counts, the scalability of directory-based cache coherence has become a challen...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
In embedded system-on-a-chip (SoC) applications, the demand for integrating heterogeneous processors...
Computing workloads often contain a mix of interac-tive, latency-sensitive foreground applications a...
This paper considers a large scale, cache-based multiprocessor that is interconnected by a hierarchi...
Manycore processors, with tens to hundreds of tiny cores but no hardware-based cache coherence, can ...
112 pagesSince the end of Dennard’s scaling, computer architects have fully embraced parallelism to ...
Architects have adopted the shared memory model that implicitly manages cache coherence and cache ca...
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our appro...
Emerging task-based parallel programming models shield programmers from the daunting task of paralle...
Computational task DAGs are executed on parallel computers by a task scheduling algorithm. Intellige...
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a ch...
International audienceIn a parallel computing context, peak performance is hard to reach with irregu...
Computing workloads often contain a mix of interactive, latency-sensitive foreground applications an...
Emerging task-based parallel programming models shield programmers from the daunting task of paralle...
With increasing core counts, the scalability of directory-based cache coherence has become a challen...
Reducing memory latency is critical to the performance of large-scale parallel systems. Due to the t...
In embedded system-on-a-chip (SoC) applications, the demand for integrating heterogeneous processors...
Computing workloads often contain a mix of interac-tive, latency-sensitive foreground applications a...
This paper considers a large scale, cache-based multiprocessor that is interconnected by a hierarchi...