Improving data locality of tensor data structures is a crucial optimization for maximizing the performance of Machine Learning and intensive Linear Algebra applications. While CPUs and GPUs improve data locality by means of automated caching mechanisms, FPGAs let the developer specify data structure allocation. Although this feature enables a high degree of customizability, the increasing complexity and memory footprint of modern applications prevent considering any manual approach to find an optimal allocation. For this reason, we propose a compiler optimization to automatically improve the tensor allocation of high-level software descriptions. The optimization is controlled by a flexible cost model that can be tuned by means of simple yet...
Abstract. Empirical optimizers like ATLAS have been very effective in optimizing computational kerne...
We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional a...
This thesis studies data-parallelism in tensor assignments. Building on an existent domain specific ...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
140 pagesTensor algebra lives at the heart of big data applications. Where classical machine learnin...
Tensors are higher-dimensional analogs of matrices, and represent a key data abstraction for many ap...
The emergence of deep learning has launched many works in deep learning accelerators. To fully reali...
The emergence of deep learning has launched many works in deep learning accelerators. To fully reali...
Complex tensor contraction expressions arise in accurate electronic structure models in quantum chem...
This paper discusses a program synthesis system to facil-itate the generation of high-performance pa...
Optimizing the implementation of tensor computations is essential to exploiting the full capacity of...
Optimizing the implementation of tensor computations is essential to exploiting the full capacity of...
Abstract. Empirical optimizers like ATLAS have been very effective in optimizing computational kerne...
We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional a...
This thesis studies data-parallelism in tensor assignments. Building on an existent domain specific ...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
Improving data locality of tensor data structures is a crucial optimization for maximizing the perfo...
140 pagesTensor algebra lives at the heart of big data applications. Where classical machine learnin...
Tensors are higher-dimensional analogs of matrices, and represent a key data abstraction for many ap...
The emergence of deep learning has launched many works in deep learning accelerators. To fully reali...
The emergence of deep learning has launched many works in deep learning accelerators. To fully reali...
Complex tensor contraction expressions arise in accurate electronic structure models in quantum chem...
This paper discusses a program synthesis system to facil-itate the generation of high-performance pa...
Optimizing the implementation of tensor computations is essential to exploiting the full capacity of...
Optimizing the implementation of tensor computations is essential to exploiting the full capacity of...
Abstract. Empirical optimizers like ATLAS have been very effective in optimizing computational kerne...
We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional a...
This thesis studies data-parallelism in tensor assignments. Building on an existent domain specific ...