International Journal of High Performance Systems Architecture, Vol. x, No. x, xxxx 1 Model-guided Empirical Tuning of Loop Fusion

A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality

Sharad K. Singhai
Kathryn, S. McKinley

January 1997

Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can inc...

Lecture Notes on Loop Transformations for Cache Optimization 15-411: Compiler Design

Andre ́ Platzer

January 2010

In this lecture we consider loop transformations that can be used for cache optimization. The transf...

Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse

Ding, Chen

January 2000

This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...

Parameterizing loop fusion for automated empirical tuning

Zhao, Y
Yi, Q
Kennedy, K
Quinlan, D
Vuduc, R

December 2005

Traditional compilers are limited in their ability to optimize applications for different architectu...

Improving Data Locality with Loop Transformations

Mckinley, Kathryn S.
Carr, Steve
Tseng, Chau Wen

January 1996

In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...

Compiler Optimizations for Improving Data Locality

Carr, Steve
McKinley, Kathryn S.
Tseng, Chau Wen

January 1994

In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...

Fusion of Loops for Parallelism and Locality

Naraig Manjikian
Tarek S. Abdelrahman

January 1995

Loop fusion improves data locality and reduces synchronization in data-parallel applications. Howeve...

Improving memory hierarchy performance through combined loop interchange and multi-level fusion

Qing Yi
Ken Kennedy
Qing Yi
Ken Kennedy

January 2002

Because of the increasing gap between the speeds of processors and main memories, compilers must enh...

Loop Fusion for Data Locality and Parallelism

Sharad Singhai
Kathryn Mckinley

January 1996

Modern processors use memory hierarchy of several levels. Achieving high performance mandates the ef...

Model-guided empirical optimization for memory hierarchy

Chen, Chun

April 2007

UnrestrictedWe are facing an increasing performance gap between processor and memory speed on today'...

Compiler optimizations for improving data locality

Carr, Steve
McKinley, Kathryn S.
Tseng, Chau Wen

November 1994

Pruning the Optimization Search Space Using Architecture-aware Cost Models⋆

Apan Qasem
Ken Kennedy

January 2015

Abstract. In recent years, a number of strategies have emerged for em-pirically tuning applications ...

A Compiler Tool to Predict Memory Hierarchy Performance of Scientific Codes

B.B. Fraguela
R. Doallo
J. Touriño
J. Touri~no A
E.L. Zapata

January 2004

The study and understanding of memory hierarchy behavior is essential, as it is critical to current ...

Optimizing the memory bandwidth with loop morphing

Gomez, José Ignacio
Marchal, Paul
Verdoolaege, Sven
Pinuel, Luis
Catthoor, Francky

January 2004

The memory bandwidth largely determines the performance of embedded systems. However, very often com...

Program Optimization Based on Compile-Time Cache Performance Prediction

Wesley Kaplow
Boleslaw K. Szymanski

January 1996

We present a novel, compile-time method for determining the cache performance of the loop nests in a...

A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality

Sharad K. Singhai
Kathryn, S. McKinley

January 1997

Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can inc...

Lecture Notes on Loop Transformations for Cache Optimization 15-411: Compiler Design

Andre ́ Platzer

January 2010

In this lecture we consider loop transformations that can be used for cache optimization. The transf...

Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse

Ding, Chen

January 2000

This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...

Parameterizing loop fusion for automated empirical tuning

Zhao, Y
Yi, Q
Kennedy, K
Quinlan, D
Vuduc, R

December 2005

Traditional compilers are limited in their ability to optimize applications for different architectu...

Improving Data Locality with Loop Transformations

Mckinley, Kathryn S.
Carr, Steve
Tseng, Chau Wen

January 1996

In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...

Compiler Optimizations for Improving Data Locality

Carr, Steve
McKinley, Kathryn S.
Tseng, Chau Wen

January 1994

In the past decade, processor speed has become significantly faster than memory speed. Small, fast c...

Fusion of Loops for Parallelism and Locality

Naraig Manjikian
Tarek S. Abdelrahman

January 1995

Loop fusion improves data locality and reduces synchronization in data-parallel applications. Howeve...

Improving memory hierarchy performance through combined loop interchange and multi-level fusion

Qing Yi
Ken Kennedy
Qing Yi
Ken Kennedy

January 2002

Because of the increasing gap between the speeds of processors and main memories, compilers must enh...

Loop Fusion for Data Locality and Parallelism

Sharad Singhai
Kathryn Mckinley

January 1996

Modern processors use memory hierarchy of several levels. Achieving high performance mandates the ef...

Model-guided empirical optimization for memory hierarchy

Chen, Chun

April 2007

UnrestrictedWe are facing an increasing performance gap between processor and memory speed on today'...

Compiler optimizations for improving data locality

Carr, Steve
McKinley, Kathryn S.
Tseng, Chau Wen

November 1994

Pruning the Optimization Search Space Using Architecture-aware Cost Models⋆

Apan Qasem
Ken Kennedy

January 2015

Abstract. In recent years, a number of strategies have emerged for em-pirically tuning applications ...

A Compiler Tool to Predict Memory Hierarchy Performance of Scientific Codes

B.B. Fraguela
R. Doallo
J. Touriño
J. Touri~no A
E.L. Zapata

January 2004

The study and understanding of memory hierarchy behavior is essential, as it is critical to current ...

Optimizing the memory bandwidth with loop morphing

Gomez, José Ignacio
Marchal, Paul
Verdoolaege, Sven
Pinuel, Luis
Catthoor, Francky

January 2004

The memory bandwidth largely determines the performance of embedded systems. However, very often com...

Program Optimization Based on Compile-Time Cache Performance Prediction

Wesley Kaplow
Boleslaw K. Szymanski

January 1996

We present a novel, compile-time method for determining the cache performance of the loop nests in a...

A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality

Sharad K. Singhai
Kathryn, S. McKinley

January 1997

Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can inc...

Lecture Notes on Loop Transformations for Cache Optimization 15-411: Compiler Design

Andre ́ Platzer

January 2010

In this lecture we consider loop transformations that can be used for cache optimization. The transf...

Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse

Ding, Chen

January 2000

This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19...

International Journal of High Performance Systems Architecture, Vol. x, No. x, xxxx 1 Model-guided Empirical Tuning of Loop Fusion

Abstract

Extracted data

International Journal of High Performance Systems Architecture, Vol. x, No. x, xxxx 1 Model-guided Empirical Tuning of Loop Fusion

Abstract

Extracted data

Related items

Related items