The Tensor Contraction Engine (TCE) is a compiler that translates high-level, mathematical tensor contraction expressions into efficient, parallel Fortran code. A pair of optimizations in the TCE, the fusion and tiling optimizations, have proven successful for minimizing disk-to-memory traffic for dense tensor computations. While other optimizations are specific to tensor contraction expressions, these two model-driven search-based optimization algorithms could also be useful for optimizing handwritten dense array computations to minimize disk to memory traffic. In this thesis, we show how to apply the loop fusion algorithm to handwritten code in a procedural language. While in the TCE the loop fusion algorithm operated on high-level expres...
Loop fusion is a program transformation that merges multiple loops into one and is an effective opti...
Compiler optimization is a long-standing research field that enhances program performance with a set...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Complex tensor contraction expressions arise in accurate electronic structure models in quantum chem...
On modern processors, data transfer exceeds floating-point operations as the predominant cost in man...
A large number of scientific and engineering applications are highly data in-tensive, operating on d...
As the demand increases for high performance and power efficiency in modern computer runtime systems...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
Loop fusion is a program transformation that merges multiple loops into one and is an effective opti...
Abstract. This paper describes an approach to synthesis of efficient out-of-core code for a class of...
A safe basis for automatic loop parallelization is the polyhedron model which represents the iterati...
many benefits in simplifying array-based computations and expressing data parallelism. However, they...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
This paper presents a technique for memory optimization for a class of computations that arises in t...
Exploiting parallelism in loops in programs is an important factor in realizing the potential perfor...
Loop fusion is a program transformation that merges multiple loops into one and is an effective opti...
Compiler optimization is a long-standing research field that enhances program performance with a set...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Complex tensor contraction expressions arise in accurate electronic structure models in quantum chem...
On modern processors, data transfer exceeds floating-point operations as the predominant cost in man...
A large number of scientific and engineering applications are highly data in-tensive, operating on d...
As the demand increases for high performance and power efficiency in modern computer runtime systems...
Over the past 20 years, increases in processor speed have dramatically outstripped performance incre...
Loop fusion is a program transformation that merges multiple loops into one and is an effective opti...
Abstract. This paper describes an approach to synthesis of efficient out-of-core code for a class of...
A safe basis for automatic loop parallelization is the polyhedron model which represents the iterati...
many benefits in simplifying array-based computations and expressing data parallelism. However, they...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/18...
This paper presents a technique for memory optimization for a class of computations that arises in t...
Exploiting parallelism in loops in programs is an important factor in realizing the potential perfor...
Loop fusion is a program transformation that merges multiple loops into one and is an effective opti...
Compiler optimization is a long-standing research field that enhances program performance with a set...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...