[[abstract]]This paper presents an efficient compilation technique to generate the local memory access sequences for block-cyclically distributed array references with affine subscripts in data-parallel programs. For the memory accesses of an array reference with affine subscript within a two-nested loop, there exist repetitive patterns both at the outer and inner loops. We use tables to record the memory accesses of repetitive patterns. According to these tables, a new start-computation algorithm is proposed to compute the starting elements on a processor for each outer loop iteration. The complexities of the table constructions are O(k+s2), where k is the distribution block size and s2 is the access stride for the inner loop. After tables...
[[abstract]]Array redistribution is usually required to enhance algorithm performance in many parall...
An algorithm for mapping an arbitrary, multidimensional array onto an arbitrarily shaped multidimens...
The bandwidth mismatch between processor and main memory is one major limiting problem. Although str...
[[abstract]]Address generation for compiling programs, written in HPF, to executable SPMD code is an...
Data-parallel languages, such as High Performance Fortran, are designed to make programming of distr...
An important research topic is parallelizing of compilers to generate local memory access sequences ...
Arrays are mapped to processors through a two-step process---alignment followed by distribution---in...
scratch pad memory, affine reference This paper considers compiler management of fast, local memorie...
This paper presents compilation techniques used to compress holes, which are caused by the nonunit a...
[[abstract]]This paper presents compilation techniques used to compress holes, which are caused by t...
This paper presents compilation techniques to compress holes, which are caused by the non-unit align...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
We present new techniques for compilation of arbitrarily nested loops with affine dependences for di...
This paper adresses the problem of efficient mappings of nested loops, and more generally of system...
(eng) We investigate the technique of storing multiple array elements in the same memory cell, with ...
[[abstract]]Array redistribution is usually required to enhance algorithm performance in many parall...
An algorithm for mapping an arbitrary, multidimensional array onto an arbitrarily shaped multidimens...
The bandwidth mismatch between processor and main memory is one major limiting problem. Although str...
[[abstract]]Address generation for compiling programs, written in HPF, to executable SPMD code is an...
Data-parallel languages, such as High Performance Fortran, are designed to make programming of distr...
An important research topic is parallelizing of compilers to generate local memory access sequences ...
Arrays are mapped to processors through a two-step process---alignment followed by distribution---in...
scratch pad memory, affine reference This paper considers compiler management of fast, local memorie...
This paper presents compilation techniques used to compress holes, which are caused by the nonunit a...
[[abstract]]This paper presents compilation techniques used to compress holes, which are caused by t...
This paper presents compilation techniques to compress holes, which are caused by the non-unit align...
This paper presents a technique for finding good distributions of arrays and suitable loop restructu...
We present new techniques for compilation of arbitrarily nested loops with affine dependences for di...
This paper adresses the problem of efficient mappings of nested loops, and more generally of system...
(eng) We investigate the technique of storing multiple array elements in the same memory cell, with ...
[[abstract]]Array redistribution is usually required to enhance algorithm performance in many parall...
An algorithm for mapping an arbitrary, multidimensional array onto an arbitrarily shaped multidimens...
The bandwidth mismatch between processor and main memory is one major limiting problem. Although str...