Today, scientific computing plays an important role in scientific research. People build supercomputers to support the computational needs of large-scale scientific applications. Achieving high performance on today's supercomputers is difficult, in large part due to the complexity of the node architectures, which include wide-issue instruction-level parallelism, SIMD operations, multiple cores, multiple threads per core, and a deep memory hierarchy. In addition, growth of compute performance has outpaced the growth of memory bandwidth, making memory bandwidth a scarce resource. People have proposed various optimization methods, including tiling and prefetching, to make better usage of the memory hierarchy. However, due to architectural di...
Due to copyright restrictions, the access to the full text of this article is only available via sub...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
AbstractEmpirical performance optimization of computer codes using autotuners has received significa...
International audienceA wide range of scientific and machine learning applications depend on highly ...
Abstract. The increasing complexities of modern architectures require compilers to extensively apply...
The recent transformation from an environment where gains in computational performance came from inc...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Optimizing the implementation of tensor computations is essential to exploiting the full capacity of...
In high-performance computing, excellent node-level performance is required for the efficient use of...
This dissertation is concerned with the development of novel high-performance algorithms for tensor ...
140 pagesTensor algebra lives at the heart of big data applications. Where classical machine learnin...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
The emergence of deep learning has launched many works in deep learning accelerators. To fully reali...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
Due to copyright restrictions, the access to the full text of this article is only available via sub...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
AbstractEmpirical performance optimization of computer codes using autotuners has received significa...
International audienceA wide range of scientific and machine learning applications depend on highly ...
Abstract. The increasing complexities of modern architectures require compilers to extensively apply...
The recent transformation from an environment where gains in computational performance came from inc...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Optimizing the implementation of tensor computations is essential to exploiting the full capacity of...
In high-performance computing, excellent node-level performance is required for the efficient use of...
This dissertation is concerned with the development of novel high-performance algorithms for tensor ...
140 pagesTensor algebra lives at the heart of big data applications. Where classical machine learnin...
Over the last several decades we have witnessed tremendous change in the landscape of computer archi...
The emergence of deep learning has launched many works in deep learning accelerators. To fully reali...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Abstract—Autotuning systems intelligently navigate a search space of possible implementations of a c...
Due to copyright restrictions, the access to the full text of this article is only available via sub...
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computation...
AbstractEmpirical performance optimization of computer codes using autotuners has received significa...