In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communication will certainly continue to rise as compute-density growth follows the current technology and industry trends. Design of lower-communication alternatives to fundamental computational algorithms has become an important field of research. For distributed 1-D FFT, communication cost has hitherto remained high as all industry-standard implementations perform three all-to-all internode data exchanges (also called global transposes). These communication steps indeed dominate execution time. In this paper, we present a mathematical framework from which many single-all-to-all...
A generalized algorithm has been derived for the execution of the Cooley-Tukey FFT algorithm on a di...
Abstract. We present an MPI based software library for computing fast Fourier transforms (FFTs) on m...
Abstract. This paper presents a comprehensive story of the development of simpler performance models...
Abstract—In high-performance computing on distributed-memory systems, communication often represents...
This paper demonstrates the first tera-scale performance of IntelR © Xeon Phi TM coprocessors on 1D ...
AbstractThe development of the fast Fourier transform (FFT) and its numerous variants in the past 30...
This paper presents a new and optimal parallel implementation of multidimensional fast Fourier trans...
The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in...
This paper addresses the problem of monodimensional (1D) FFT parallel computation on constant-valenc...
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Differe...
In this paper, the computation of a one-dimensional FFT on a c-dimensional torus multicomputer is an...
AbstractThe present paper begins with a survey of various up-to-date parallel 3-D FFT algorithms and...
Abstract. This paper introduces a formal framework for automatically generating performance optimize...
In this paper we propose a new approach to the study of the communication requirements of distribute...
In this work, we propose parallel FFT algorithms, for medium-to-coarse grain hypercubeconnected mult...
A generalized algorithm has been derived for the execution of the Cooley-Tukey FFT algorithm on a di...
Abstract. We present an MPI based software library for computing fast Fourier transforms (FFTs) on m...
Abstract. This paper presents a comprehensive story of the development of simpler performance models...
Abstract—In high-performance computing on distributed-memory systems, communication often represents...
This paper demonstrates the first tera-scale performance of IntelR © Xeon Phi TM coprocessors on 1D ...
AbstractThe development of the fast Fourier transform (FFT) and its numerous variants in the past 30...
This paper presents a new and optimal parallel implementation of multidimensional fast Fourier trans...
The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in...
This paper addresses the problem of monodimensional (1D) FFT parallel computation on constant-valenc...
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Differe...
In this paper, the computation of a one-dimensional FFT on a c-dimensional torus multicomputer is an...
AbstractThe present paper begins with a survey of various up-to-date parallel 3-D FFT algorithms and...
Abstract. This paper introduces a formal framework for automatically generating performance optimize...
In this paper we propose a new approach to the study of the communication requirements of distribute...
In this work, we propose parallel FFT algorithms, for medium-to-coarse grain hypercubeconnected mult...
A generalized algorithm has been derived for the execution of the Cooley-Tukey FFT algorithm on a di...
Abstract. We present an MPI based software library for computing fast Fourier transforms (FFTs) on m...
Abstract. This paper presents a comprehensive story of the development of simpler performance models...