Orchestrating On-Chip Memory Resources for Throughput-Oriented Compilation

Publication date

January 2012

DOI

Abstract

A key factor in GPU performance efficiency is the number of active threads that can run simultaneously on each streaming multi-processor. The active threads have their states saved on fast memory devices and can quickly be scheduled to run if the set of running threads stalls due to memory latency. The greater number of active threads we have, the higher utilization we can obtain from many-core processor pipelines. To achieve optimal utilization, we typically need many more active threads than the number of physical cores. Due to limited on-chip memory resources including registers and scratch-pad memory, and the fact that every thread gets a equal partition of on-chip memory resource, the number of active threads depends on the characteris...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Orchestrating On-Chip Memory Resources for Throughput-Oriented Compilation

Abstract

Extracted data

Orchestrating On-Chip Memory Resources for Throughput-Oriented Compilation

Abstract

Extracted data

Related items

Related items