The Cerebras CS-1 is a computing system based on a wafer-scale processor having nearly 400,000 compute cores. It is intended for training of and inference on deep neural networks. The architecture has several features specifically designed for this and related fields. One of these is a sophisticated SIMD engine that can mimic a rectangular loop nest of depth at most four. In order to achieve optimal performance, it is crucial to use SIMD instructions as much as possible. This paper describes a high-level polyhedral compiler that takes a high-level algorithm description that can be written manually or extracted from a TensorFlow computation graph and generates input to the low-level C-based compiler. In this intermediate code, the use of SI...
Many media processing algorithms suffer from long execution times, which are most often not acceptab...
High-level loop transformations change the order in which basic computations in a program are execut...
Single instruction, multiple data (SIMD) is a class of parallel computing that involves executing a ...
The Single Instruction Multiple Data (SIMD) paradigm promises speedup at relatively low silicon area...
Although Single Instruction Multiple Data (SIMD) units are available in general purpose processors a...
Although Single Instruction Multiple Data (SIMD) units are available in general purpose processors a...
This thesis talks about techniques which can be used to optimize run time of algorithms. For a demon...
This paper introduces TIRAMISU, a polyhedral framework designed to generate high performance code fo...
Title: SIMD code generator Author: Karel Tuček Department: Department of Software Engineering Superv...
This paper describes methods to adapt existing optimizing compilers for sequential languages to prod...
The polyhedral model for loop parallelization has proved to be an effective tool for ad-vanced optim...
As an effective way of utilizing data parallelism in applications, SIMD architecture has been adopte...
International audienceAutomatic parallelization is becoming more important as parallelism becomes ub...
Computers become increasingly complex. Current and future systems feature configurable hardware, mul...
International audienceHigh-level synthesis (HLS) allows hardware to be directly produced from behavi...
Many media processing algorithms suffer from long execution times, which are most often not acceptab...
High-level loop transformations change the order in which basic computations in a program are execut...
Single instruction, multiple data (SIMD) is a class of parallel computing that involves executing a ...
The Single Instruction Multiple Data (SIMD) paradigm promises speedup at relatively low silicon area...
Although Single Instruction Multiple Data (SIMD) units are available in general purpose processors a...
Although Single Instruction Multiple Data (SIMD) units are available in general purpose processors a...
This thesis talks about techniques which can be used to optimize run time of algorithms. For a demon...
This paper introduces TIRAMISU, a polyhedral framework designed to generate high performance code fo...
Title: SIMD code generator Author: Karel Tuček Department: Department of Software Engineering Superv...
This paper describes methods to adapt existing optimizing compilers for sequential languages to prod...
The polyhedral model for loop parallelization has proved to be an effective tool for ad-vanced optim...
As an effective way of utilizing data parallelism in applications, SIMD architecture has been adopte...
International audienceAutomatic parallelization is becoming more important as parallelism becomes ub...
Computers become increasingly complex. Current and future systems feature configurable hardware, mul...
International audienceHigh-level synthesis (HLS) allows hardware to be directly produced from behavi...
Many media processing algorithms suffer from long execution times, which are most often not acceptab...
High-level loop transformations change the order in which basic computations in a program are execut...
Single instruction, multiple data (SIMD) is a class of parallel computing that involves executing a ...