Abstract—The codelet model is a fine-grain dataflow-inspired program execution model that balances the parallelism and overhead of the runtime system. It plays an important role in terms of performance, scalability, and energy efficiency in exascale studies such as the DARPA UHPC project and the DOE X-Stack project. As an important application, the Fast Fourier Transform (FFT) has been deeply studied in fine-grain models, including the codelet model. However, the existing work focuses on how fine-grain models achieve more balanced workload comparing to traditional coarse-grain models. In this paper, we make an important observation that the flexibility of execution order of tasks in fine-grain models improves utilization of memory bandwidth...
The FFT support in an Ericsson's proprietary DSP is to be improved in order to achieve high performa...
We are now entering the multi-core era, many multi-core chips are designed and manufactured by vario...
Many of the current applications used in battery powered devices are from digital signal processing,...
Gao, Guang R.The upcoming exa-scale era requires a parallel program execution model capable of achie...
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Tra...
Abstract. We present a new algorithm for the Fast Fourier Transform which is a factor of 2 to 4 time...
Increased complexity of memory systems to ameliorate the gap between the speed of processors and mem...
Gao, Guang R.Over the past decade computer architectures have drastically evolved to circumnavigate ...
We select the Fast Fourier Transform (FFT) to demonstrate a methodology for deriving the optimal par...
For a wide variety of applications, both task and data parallelism must be exploited to achieve the ...
<p>Fast Fourier transform algorithms on large data sets achieve poor performance on various platform...
core architecture as a case study to show how to exploit locality and save energy in the fine-grain ...
In this paper, we present an early version of a SYCL-based FFT library, capable of running on all ma...
Fast Fourier Transform (FFT) is one of the most widely used algorithms in digital signal processing....
Abstract. Modern graphics processing units (GPU) are becoming more and more suitable for general pur...
The FFT support in an Ericsson's proprietary DSP is to be improved in order to achieve high performa...
We are now entering the multi-core era, many multi-core chips are designed and manufactured by vario...
Many of the current applications used in battery powered devices are from digital signal processing,...
Gao, Guang R.The upcoming exa-scale era requires a parallel program execution model capable of achie...
Several SOA (state of the art) self-tuning software libraries exist, such as the Fastest Fourier Tra...
Abstract. We present a new algorithm for the Fast Fourier Transform which is a factor of 2 to 4 time...
Increased complexity of memory systems to ameliorate the gap between the speed of processors and mem...
Gao, Guang R.Over the past decade computer architectures have drastically evolved to circumnavigate ...
We select the Fast Fourier Transform (FFT) to demonstrate a methodology for deriving the optimal par...
For a wide variety of applications, both task and data parallelism must be exploited to achieve the ...
<p>Fast Fourier transform algorithms on large data sets achieve poor performance on various platform...
core architecture as a case study to show how to exploit locality and save energy in the fine-grain ...
In this paper, we present an early version of a SYCL-based FFT library, capable of running on all ma...
Fast Fourier Transform (FFT) is one of the most widely used algorithms in digital signal processing....
Abstract. Modern graphics processing units (GPU) are becoming more and more suitable for general pur...
The FFT support in an Ericsson's proprietary DSP is to be improved in order to achieve high performa...
We are now entering the multi-core era, many multi-core chips are designed and manufactured by vario...
Many of the current applications used in battery powered devices are from digital signal processing,...