This thesis describes novel techniques and test implementations for optimizing numerically intensive codes. Our main focus is on how given algorithms can be adapted to run efficiently on modern microprocessor exploring several architectural features including, instruction selection, and access patterns related to having several levels of cache. Our approach is also shown to be relevant for multicore architectures. Our primary target applications are linear algebra routines in the form of matrix multiply with dense matrices. We analyze how current compilers, microprocessor and common optimization techniques (like loop tiling and date relocation) interact. A tunable assembly code generator is developed, built, and tested on a basic BLAS le...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Abstract. The use of highly optimized inner kernels is of paramount im-portance for obtaining effici...
We are presenting a new method and algorithm for solving several common problems of linear algebra a...
This paper examines how to write code to gain high performance on modern computers as well as the im...
Abstract. Traditional parallel programming methodologies for improv-ing performance assume cache-bas...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Matrix computations lie at the heart of most scientific computational tasks. The solution of linear ...
The multiplication of a sparse matrix with a dense vector is a performance critical computational ke...
International audienceCurrent compilers cannot generate code that can compete with hand-tuned code i...
Almost every modern processor is designed with a memory hierarchy organized into several levels, eac...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
International audienceIn this paper, a new methodology for computing the Dense Matrix Vector Multipl...
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on ...
A plethora of program analysis and optimization techniques rely on linear programming at their heart...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Abstract. The use of highly optimized inner kernels is of paramount im-portance for obtaining effici...
We are presenting a new method and algorithm for solving several common problems of linear algebra a...
This paper examines how to write code to gain high performance on modern computers as well as the im...
Abstract. Traditional parallel programming methodologies for improv-ing performance assume cache-bas...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Matrix computations lie at the heart of most scientific computational tasks. The solution of linear ...
The multiplication of a sparse matrix with a dense vector is a performance critical computational ke...
International audienceCurrent compilers cannot generate code that can compete with hand-tuned code i...
Almost every modern processor is designed with a memory hierarchy organized into several levels, eac...
Out-of-core implementations of algorithms for dense matrix computations have traditionally focused o...
International audienceIn this paper, a new methodology for computing the Dense Matrix Vector Multipl...
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on ...
A plethora of program analysis and optimization techniques rely on linear programming at their heart...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
Abstract. The use of highly optimized inner kernels is of paramount im-portance for obtaining effici...
We are presenting a new method and algorithm for solving several common problems of linear algebra a...