Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements. However, despite its promise, low-precision arithmetic has received little attention for Gaussian processes (GPs), largely because GPs require sophisticated linear algebra routines that are unstable in low-precision. We study the different failure modes that can occur when training GPs in half precision. To circumvent these failure modes, we propose a multi-faceted approach involving conjugate gradients with re-orthogonalization, mixed precision, and preconditioning. Our approach significantly improves the numerical stability and practical performance of conjugate gradients in low-precision over ...
The precision used in an algorithm affects the error and performance of individual computations, the...
International audienceThe use of IEEE 754-2008 half-precision floating-point numbers is an emerging ...
Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation...
Since numbers in the computer are represented with a fixed number of bits, loss of accuracy during c...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
Traditional optimization methods rely on the use of single-precision floating point arithmetic, whic...
International audienceThe most compute-intensive stage of deep neural network (DNN) training is matr...
Mixed-precision (MP) arithmetic combining both single- and half-precision operands has been successf...
The acceleration of deep-learning kernels in hardware relies on matrix multiplications that are exec...
Motivated by the demand in machine learning, modern computer hardware is increas- ingly supporting r...
International audienceGraphics Processing Units (GPUs) offer the possibility to execute floating-poi...
Hardware accelerators for Deep Neural Networks (DNNs) that use reduced precision parameters are more...
The largest dense linear systems that are being solved today are of order $n = 10^7$. Single precis...
Low rank matrix approximations appear in a number of scientific computing applications. We consider ...
Gaussian processes (GPs) produce good probabilistic models of functions, but most GP kernels require...
The precision used in an algorithm affects the error and performance of individual computations, the...
International audienceThe use of IEEE 754-2008 half-precision floating-point numbers is an emerging ...
Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation...
Since numbers in the computer are represented with a fixed number of bits, loss of accuracy during c...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
Traditional optimization methods rely on the use of single-precision floating point arithmetic, whic...
International audienceThe most compute-intensive stage of deep neural network (DNN) training is matr...
Mixed-precision (MP) arithmetic combining both single- and half-precision operands has been successf...
The acceleration of deep-learning kernels in hardware relies on matrix multiplications that are exec...
Motivated by the demand in machine learning, modern computer hardware is increas- ingly supporting r...
International audienceGraphics Processing Units (GPUs) offer the possibility to execute floating-poi...
Hardware accelerators for Deep Neural Networks (DNNs) that use reduced precision parameters are more...
The largest dense linear systems that are being solved today are of order $n = 10^7$. Single precis...
Low rank matrix approximations appear in a number of scientific computing applications. We consider ...
Gaussian processes (GPs) produce good probabilistic models of functions, but most GP kernels require...
The precision used in an algorithm affects the error and performance of individual computations, the...
International audienceThe use of IEEE 754-2008 half-precision floating-point numbers is an emerging ...
Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation...