High-Level Synthesis (HLS) has advanced significantly in compiling high-level “soft” programs into efficient register-transfer level (RTL) “hard” specifications. However, manually rewriting C-like code is still often required in order to effectively optimize the access performance of synthesized memory subsystems. As such, extensive research has been performed on developing and implementing automated memory optimization techniques, among which memory banking has been a key technique for access performance improvement. However, several key questions remain to be answered: given a stencil-based computing kernel, what constitutes an optimal memory banking scheme that minimizes the number of memory banks required for conflict-free accesses? Fur...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Most computers today are based on the von Neumann architecture introduced by John von Neumann in 194...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Irregular memory access pattern in non-stencil kernel computing renders the well-known hyperplane-[1...
High-Level Synthesis (HLS) tools are a set of algorithms that allow programmers to obtain implementa...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
As we witness the breakdown of Dennard scaling, we can no longer get faster computers by shrinking t...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Processor cache memory management deeply impacts performances and power consumption of electronic de...
Being \u27memory-centric\u27, the single-chip distributed logic-memory (DLM) architecture can signif...
Although modern supercomputers are composed of multicore machines, one can find scientists that stil...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like op...
We are witnessing a fundamental paradigm shift in computer design. Memory has been and is becoming m...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Most computers today are based on the von Neumann architecture introduced by John von Neumann in 194...
Application codes reliably achieve performance far less than the advertised capabilities of existing...
Irregular memory access pattern in non-stencil kernel computing renders the well-known hyperplane-[1...
High-Level Synthesis (HLS) tools are a set of algorithms that allow programmers to obtain implementa...
Stencil-based kernels constitute the core of many scientific applications on block-structured grids....
As we witness the breakdown of Dennard scaling, we can no longer get faster computers by shrinking t...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Stencil computations form the basis for computer simulations across almost every field of science, s...
Processor cache memory management deeply impacts performances and power consumption of electronic de...
Being \u27memory-centric\u27, the single-chip distributed logic-memory (DLM) architecture can signif...
Although modern supercomputers are composed of multicore machines, one can find scientists that stil...
This work introduces a generalized framework for automatically tuning stencil computations to achiev...
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like op...
We are witnessing a fundamental paradigm shift in computer design. Memory has been and is becoming m...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Most computers today are based on the von Neumann architecture introduced by John von Neumann in 194...
Application codes reliably achieve performance far less than the advertised capabilities of existing...