Automatic library generators, such as ATLAS [11], Spi-ral [8] and FFTW [2], are promising technologies to gener-ate efficient code for different computer architectures. The library generators usually tune programs using two layers of optimizations: the search at the algorithm level, and the optimization for micro kernels. The micro optimizations are important for the performance of library, because the opti-mized micro kernels are the bases of algorithm level search, and have a great impact on the overall performance of the generated libraries. A successfully optimized micro kernel requires thorough understanding of the interaction between architectural features and highly optimized code. However, literature on library generators focus more...
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been uti...
Thesis (Ph.D.)--University of Washington, 2021Seamless gains in performance from technology scaling ...
Abstract. Achieving peak performance in important numerical kernels such as dense matrix multiply or...
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Tra...
Abstract. This paper presents compiler technology that targets general purpose microprocessors augme...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
We present an auto-tuning framework for FFTs on graphics pro-cessors (GPUs). Due to complex design o...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
Li, XiaomingGenerating high performance Fast Fourier Transform (FFT) library is an important researc...
In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation...
This paper analyzes the limits of FFT performance on FPGAs. For this purpose, a FFT generation tool ...
For decades, computer scientists have sought guidance on how to evolve architectures, languages, and...
In this paper, we present an early version of a SYCL-based FFT library, capable of running on all ma...
FFT implementations today generally fall into two categories: Library generators (such as FFTW and S...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been uti...
Thesis (Ph.D.)--University of Washington, 2021Seamless gains in performance from technology scaling ...
Abstract. Achieving peak performance in important numerical kernels such as dense matrix multiply or...
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Tra...
Abstract. This paper presents compiler technology that targets general purpose microprocessors augme...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
We present an auto-tuning framework for FFTs on graphics pro-cessors (GPUs). Due to complex design o...
We present an auto-tuning approach to optimize application performance on emerging multicore archite...
Li, XiaomingGenerating high performance Fast Fourier Transform (FFT) library is an important researc...
In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation...
This paper analyzes the limits of FFT performance on FPGAs. For this purpose, a FFT generation tool ...
For decades, computer scientists have sought guidance on how to evolve architectures, languages, and...
In this paper, we present an early version of a SYCL-based FFT library, capable of running on all ma...
FFT implementations today generally fall into two categories: Library generators (such as FFTW and S...
Obtaining high performance without machine-specific tuning is an important goal of scientific applic...
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been uti...
Thesis (Ph.D.)--University of Washington, 2021Seamless gains in performance from technology scaling ...
Abstract. Achieving peak performance in important numerical kernels such as dense matrix multiply or...