The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization schedule. However, automatic schedule generation is currently only possible for multi-core CPU architectures. As a result, expert knowledge is still required when optimizing for platforms with GPU capabilities. In this work, we extend the current Halide Autoscheduler with novel optimization passes to efficiently generate schedules for CUDA-based GPU architectures. We evaluate our proposed method across a variety of applications and show that it can achieve performance competitive with that of manually tuned Halide schedules, or in man...
Modern automotive-grade embedded computing platforms feature high-performance Graphics Processing Un...
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitaskin...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...
The Halide DSL and compiler have enabled high-performance code generation for image processing pipel...
We present a new algorithm to automatically generate high-performance GPU implementations of complex...
\u3cp\u3eEfficient code generation for image processing applications continues to pose a challenge i...
We present a new algorithm to automatically schedule Halide programs for high-performance image proc...
Tensor Cores (TCUs) are specialized units first introduced by NVIDIA in the Volta microarchitecture ...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
In this paper we present a heavily exploration oriented implementation of genetic algorithms to be e...
We present a high-level synthesis framework to synthesize optimized hardware on FPGAs from algorithm...
With the emergence of General Purpose computation on GPU (GPGPU) and corresponding programming fram...
Over the last few years, the ever-increasing use of Graphic Processing Units (GPUs) in safety-relate...
International audienceIn this paper, we present a comparison of scheduling strategies for heterogene...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms ...
Modern automotive-grade embedded computing platforms feature high-performance Graphics Processing Un...
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitaskin...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...
The Halide DSL and compiler have enabled high-performance code generation for image processing pipel...
We present a new algorithm to automatically generate high-performance GPU implementations of complex...
\u3cp\u3eEfficient code generation for image processing applications continues to pose a challenge i...
We present a new algorithm to automatically schedule Halide programs for high-performance image proc...
Tensor Cores (TCUs) are specialized units first introduced by NVIDIA in the Volta microarchitecture ...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
In this paper we present a heavily exploration oriented implementation of genetic algorithms to be e...
We present a high-level synthesis framework to synthesize optimized hardware on FPGAs from algorithm...
With the emergence of General Purpose computation on GPU (GPGPU) and corresponding programming fram...
Over the last few years, the ever-increasing use of Graphic Processing Units (GPUs) in safety-relate...
International audienceIn this paper, we present a comparison of scheduling strategies for heterogene...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms ...
Modern automotive-grade embedded computing platforms feature high-performance Graphics Processing Un...
In this study, we provide an extensive survey on wide spectrum of scheduling methods for multitaskin...
Recent advances in GPUs (graphics processing units) lead to mas-sively parallel hardware that is eas...