This thesis presents a methodology to automatically determine a data memory organisation at compile time, suitable to exploit data reuse and loop-level parallelization, in order to achieve high performance and low power design for data-dominated applications. Moore’s Law has enabled more and more heterogeneous components integrated on a single chip. However, there are challenges to extract maximum performance from these hardware resources efficiently. Unlike previous approaches, which mainly focus on making efficient use of computational resources, our focus is on data memory organisation and input-output bandwidth considerations, which are the typical stumbling block of existing hardware compilation schemes. To optimize accesses to...
Quintillions of bytes of data are generated every day in this era of big data. Machine learning tech...
This paper describes techniques for translating out-of-core programs written in a data parallel lang...
The pin count largely determines the cost of a chip package, which is often comparable to the cost o...
This thesis presents a methodology to automatically determine a data memory organisation at compilet...
Abstract—A nonlinear optimization framework is proposed in this paper to automate exploration of the...
With the large resource densities available on modern FPGAs it is often the available memory bandwi...
This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel...
The power, frequency, and memory wall problems have caused a major shift in mainstream computing by ...
Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. ...
As the demand increases for high performance and power efficiency in modern computer runtime systems...
This thesis is concerned with hardware approaches for maximizing the number of independent instructi...
The memory system is a major bottleneck in achieving high performance and energy efficiency for vari...
Growing demand for computational performance, and the rising cost for chip design and manufacturing...
University of Minnesota Ph.D. dissertation. September 2014. Major: Computer Science. Advisor: Pen-Ch...
This thesis proposes novel optimisations for high performance runtime reconfigurable designs. For a...
Quintillions of bytes of data are generated every day in this era of big data. Machine learning tech...
This paper describes techniques for translating out-of-core programs written in a data parallel lang...
The pin count largely determines the cost of a chip package, which is often comparable to the cost o...
This thesis presents a methodology to automatically determine a data memory organisation at compilet...
Abstract—A nonlinear optimization framework is proposed in this paper to automate exploration of the...
With the large resource densities available on modern FPGAs it is often the available memory bandwi...
This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel...
The power, frequency, and memory wall problems have caused a major shift in mainstream computing by ...
Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. ...
As the demand increases for high performance and power efficiency in modern computer runtime systems...
This thesis is concerned with hardware approaches for maximizing the number of independent instructi...
The memory system is a major bottleneck in achieving high performance and energy efficiency for vari...
Growing demand for computational performance, and the rising cost for chip design and manufacturing...
University of Minnesota Ph.D. dissertation. September 2014. Major: Computer Science. Advisor: Pen-Ch...
This thesis proposes novel optimisations for high performance runtime reconfigurable designs. For a...
Quintillions of bytes of data are generated every day in this era of big data. Machine learning tech...
This paper describes techniques for translating out-of-core programs written in a data parallel lang...
The pin count largely determines the cost of a chip package, which is often comparable to the cost o...