New architectures for extreme-scale computing need to be designed for higher energy efficiency than current systems. The DOE-funded Traleika Glacier architecture is a recently-proposed extreme-scale manycore that radically simplifies the architecture, and proposes a cluster-based on-chip memory hierarchy without hardware cache coherence. Programming for such an environment, which can use scratchpads or incoherent caches, is challenging. Hence, this thesis focuses on architecting, programming, and evaluating an on-chip incoherent multiprocessor memory hierarchy. This thesis starts by examining incoherent multiprocessor caches. It proposes ISA support for data movement in such an environment, and two relatively user-friendly programming app...
With the advancement of design and fabrication of high-performance integrated circuits technology, i...
textOne of the major limiters to computer system performance has been the access to main memory, wh...
Today’s supercomputers are built from the state-of-the-art components to extract as much performance...
New architectures for extreme-scale computing need to be designed for higher energy efficiency than ...
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a ch...
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our appro...
Optimizing memory references has been a primary research area of computer systems ever since the adv...
As multicore systems become widespread, both software and hardware face a major challenge in efficie...
This dissertation addresses two sets of challenges facing processor design as the industry enters th...
abstract: Caches have long been used to reduce memory access latency. However, the increased complex...
The growing computing demands of emerging application domains such as Recognition/Mining/Synthesis (...
abstract: One of the main goals of computer architecture design is to improve performance without mu...
This thesis describes the efficient design of a future many-core processor that can provide higher p...
University of Minnesota Ph.D. dissertation. September 2014. Major: Computer Science. Advisor: Pen-Ch...
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power a...
With the advancement of design and fabrication of high-performance integrated circuits technology, i...
textOne of the major limiters to computer system performance has been the access to main memory, wh...
Today’s supercomputers are built from the state-of-the-art components to extract as much performance...
New architectures for extreme-scale computing need to be designed for higher energy efficiency than ...
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a ch...
This work describes a cache architecture and memory model for 1000+ core microprocessors. Our appro...
Optimizing memory references has been a primary research area of computer systems ever since the adv...
As multicore systems become widespread, both software and hardware face a major challenge in efficie...
This dissertation addresses two sets of challenges facing processor design as the industry enters th...
abstract: Caches have long been used to reduce memory access latency. However, the increased complex...
The growing computing demands of emerging application domains such as Recognition/Mining/Synthesis (...
abstract: One of the main goals of computer architecture design is to improve performance without mu...
This thesis describes the efficient design of a future many-core processor that can provide higher p...
University of Minnesota Ph.D. dissertation. September 2014. Major: Computer Science. Advisor: Pen-Ch...
Chip multiprocessors (CMPs) have become virtually ubiquitous due to the increasing impact of power a...
With the advancement of design and fabrication of high-performance integrated circuits technology, i...
textOne of the major limiters to computer system performance has been the access to main memory, wh...
Today’s supercomputers are built from the state-of-the-art components to extract as much performance...