This work presents modulo unrolling without unrolling (modulo unrolling WU), a method for message aggregation for parallel loops in message passing programs that use affine array accesses in Chapel, a Partitioned Global Address Space (PGAS) parallel programming language. Messages incur a non-trivial run time overhead, a significant component of which is independent of the size of the message. Therefore, aggregating messages improves performance. Our optimization for message aggregation is based on a technique known as modulo unrolling, pioneered by Barua [1] whose purpose was to ensure a statically predictable single tile number for each memory reference for tiled architectures, such as the MIT Raw Machine [2]. Modulo unrolling WU applies ...
ii The high performance of today’s microprocessors is achieved mainly by fast, multipleissuing hardw...
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
In order to deliver the promise of MooreÂs Law to the enduser, compilers must make decisions that ar...
This paper presents modulo unrolling without unrolling (mod-ulo unrolling WU), a method for message ...
• Improve the runtime of certain types of parallel computers – In particular, message passing comput...
The divergence of application behavior from optimal network usage leads to performance bottlenecks i...
Partitioned Global Address Space (PGAS) languages promise to deliver improved programmer productivi...
We introduce Approximate Unrolling, a loop optimization that reduces execution time and energy consu...
International audienceThis article studies an important open problem in backend compilation regardin...
International audienceModulo Variable Expansion (MVE) [1] used with soft- ware pipelining (SWP) may ...
International audienceThis paper improves our previous research effort [1] by providing an efficient...
115 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1997.This dissertation also demons...
Partitioned global address space (PGAS) languages like UPC or Fortran provide a global name space to...
International audienceSoftware pipelining is a powerful technique to expose fine-grain parallelism, ...
Clustered organizations are becoming a common trend in the design of VLIW architectures. In this wor...
ii The high performance of today’s microprocessors is achieved mainly by fast, multipleissuing hardw...
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
In order to deliver the promise of MooreÂs Law to the enduser, compilers must make decisions that ar...
This paper presents modulo unrolling without unrolling (mod-ulo unrolling WU), a method for message ...
• Improve the runtime of certain types of parallel computers – In particular, message passing comput...
The divergence of application behavior from optimal network usage leads to performance bottlenecks i...
Partitioned Global Address Space (PGAS) languages promise to deliver improved programmer productivi...
We introduce Approximate Unrolling, a loop optimization that reduces execution time and energy consu...
International audienceThis article studies an important open problem in backend compilation regardin...
International audienceModulo Variable Expansion (MVE) [1] used with soft- ware pipelining (SWP) may ...
International audienceThis paper improves our previous research effort [1] by providing an efficient...
115 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1997.This dissertation also demons...
Partitioned global address space (PGAS) languages like UPC or Fortran provide a global name space to...
International audienceSoftware pipelining is a powerful technique to expose fine-grain parallelism, ...
Clustered organizations are becoming a common trend in the design of VLIW architectures. In this wor...
ii The high performance of today’s microprocessors is achieved mainly by fast, multipleissuing hardw...
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
In order to deliver the promise of MooreÂs Law to the enduser, compilers must make decisions that ar...