Modern mobile GPUs integrate an increasing number of shader cores to speedup the execution of graphics workloads. Each core integrates a private Texture Cache to apply texturing effects on objects, which is backed-up by a shared L2 cache. However, as in any other memory hierarchy, such organization produces data replication in the upper levels (i.e., the private Texture Caches) to allow for faster accesses at the expense of reducing their overall effective capacity. E.g., in a mobile GPU with four shader cores, about 84.6% of the requested texture blocks are replicated in at least one of the other private Texture Caches. This paper proposes a novel dynamically-mapped NonUniform Cache Architecture (NUCA) organization for the private Texture ...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
Traditional graphics hardware architectures implement what we call the push architecture for texture...
Hardware texture mapping is essential for real-time rendering. Unfortunately the memory bandwidth an...
Abstract — With increasing interest in sophisticated graphics capabilities in mobile systems, energy...
Portable devices often demand powerful processors to run computing intensive applications, such as v...
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
The Last Level Cache (LLC) is a key element to improve application performance in multi-cores. To ha...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
Emerging GPU applications exhibit increasingly high computation demands which has led GPU manufactur...
This article proposes a novel micro-architecture approach for mobile GPUs aimed at early removing th...
Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Mult...
Cache Replacement Policies are known to have an important impact on hit rates. The OPT replacement p...
The design of mobile GPUs is all about saving energy. Smartphones and tablets are battery-operated a...
Video games and simulators commonly use very detailed textures, whose cumulative size is often large...
Current GPU computing models support a mixture of coherent and incoherent classes of memory operatio...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
Traditional graphics hardware architectures implement what we call the push architecture for texture...
Hardware texture mapping is essential for real-time rendering. Unfortunately the memory bandwidth an...
Abstract — With increasing interest in sophisticated graphics capabilities in mobile systems, energy...
Portable devices often demand powerful processors to run computing intensive applications, such as v...
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for a...
The Last Level Cache (LLC) is a key element to improve application performance in multi-cores. To ha...
Journal ArticleIn future multi-cores, large amounts of delay and power will be spent accessing data...
Emerging GPU applications exhibit increasingly high computation demands which has led GPU manufactur...
This article proposes a novel micro-architecture approach for mobile GPUs aimed at early removing th...
Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Mult...
Cache Replacement Policies are known to have an important impact on hit rates. The OPT replacement p...
The design of mobile GPUs is all about saving energy. Smartphones and tablets are battery-operated a...
Video games and simulators commonly use very detailed textures, whose cumulative size is often large...
Current GPU computing models support a mixture of coherent and incoherent classes of memory operatio...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
Traditional graphics hardware architectures implement what we call the push architecture for texture...
Hardware texture mapping is essential for real-time rendering. Unfortunately the memory bandwidth an...