The recent dramatic progress in machine learning is partially attributed to the availability of high-performant computers and development tools. The accelerated linear algebra (XLA) compiler is one such tool that automatically optimises array operations (mostly fusion to reduce memory operations) and compiles the optimised operations into high-performant programs specific to target computing platforms. Like machine-learning models, numerical models are often expressed in array operations, and thus their performance can be boosted by XLA. This study is the first of its kind to examine the efficiency of XLA for numerical models, and the efficiency is examined stringently by comparing its performance with that of optimal implementations. Two s...
International audienceIn this work, numerical algebraic operations are performed by using several li...
Compilers looking for an efficient implementation of a function must find which optimizations are th...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
A plethora of program analysis and optimization techniques rely on linear programming at their heart...
Les architectures parallèles sont aujourd'hui présentes dans tous les systèmes informatiques, allant...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Abstract On modern architectures, the performance of 32-bit operations is often at least twice as fa...
Computers are powerful tools which perform fast, accurate calculations over huge sets of data. Howev...
This dissertation incorporates two research projects: performance modeling and prediction for dense ...
International audienceIn this work, numerical algebraic operations are performed by using several li...
Compilers looking for an efficient implementation of a function must find which optimizations are th...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
The recent dramatic progress in machine learning is partially attributed to the availability of high...
This thesis describes novel techniques and test implementations for optimizing numerically intensive...
A plethora of program analysis and optimization techniques rely on linear programming at their heart...
Les architectures parallèles sont aujourd'hui présentes dans tous les systèmes informatiques, allant...
The goal of the LAPACK project is to provide efficient and portable software for dense numerical lin...
Abstract On modern architectures, the performance of 32-bit operations is often at least twice as fa...
Computers are powerful tools which perform fast, accurate calculations over huge sets of data. Howev...
This dissertation incorporates two research projects: performance modeling and prediction for dense ...
International audienceIn this work, numerical algebraic operations are performed by using several li...
Compilers looking for an efficient implementation of a function must find which optimizations are th...
Abstract—Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) t...