Adaptive Loop Tiling for a Multi-Cluster CMP

Jisheng Zhao
Matthew Horsnell
Ian Rogers
Chris Kirkham
Ian Watson

Publication date

August 2015

Abstract

Abstract. Loop tiling is a fundamental optimization for improving data locality. Selecting the right tile size combined with the parallelization of loops can provide additional performance increases in the modern of Chip MultiProcessor (CMP) architectures. This paper presents a runtime optimization system which automatically parallelizes loops and searches empirically for the best tile sizes on a scalable multi-cluster CMP. The system is built on top of a virtual machine and targets the runtime parallelization and optimization of Java programs. Experimental results show that runtime parallelization and tile size searching are capable of improving performance for two BLAS kernels and one Lattice-Boltzmann simulation, despite overheads

Extracted data

We use cookies to provide a better user experience.

Data Protection

Adaptive Loop Tiling for a Multi-Cluster CMP

Abstract

Extracted data

Adaptive Loop Tiling for a Multi-Cluster CMP

Abstract

Extracted data

Related items

Related items