Abstract—In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communi-cation will certainly continue to rise as compute-density growth follows the current technology and industry trends. Design of lower-communication alternatives to fundamental computational algorithms has become an important field of research. For distributed 1-D FFT, communication cost has hitherto remained high as all industry-standard implementations perform three all-to-all internode data exchanges (also called global transposes). These communication steps indeed dominate execution time. In this paper, we present a mathematical framework from which many single-...
In this work, we propose parallel FFT algorithms, for medium-to-coarse grain hypercubeconnected mult...
A generalized algorithm has been derived for the execution of the Cooley-Tukey FFT algorithm on a di...
In this paper, the problem of computing a one-dimensional FFT on a c-dimensional torus multicomputer...
In high-performance computing on distributed-memory systems, communication often represents a signif...
This paper demonstrates the first tera-scale performance of IntelR © Xeon Phi TM coprocessors on 1D ...
AbstractThe development of the fast Fourier transform (FFT) and its numerous variants in the past 30...
In this paper we propose a new approach to the study of the communication requirements of distribute...
This paper addresses the problem of monodimensional (1D) FFT parallel computation on constant-valenc...
This paper presents a new and optimal parallel implementation of multidimensional fast Fourier trans...
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Differe...
The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in...
In this paper, the computation of a one-dimensional FFT on a c-dimensional torus multicomputer is an...
In this paper we propose a new approach to the study of the communication requirements of distribute...
Abstract. This paper introduces a formal framework for automatically generating performance optimize...
AbstractThe present paper begins with a survey of various up-to-date parallel 3-D FFT algorithms and...
In this work, we propose parallel FFT algorithms, for medium-to-coarse grain hypercubeconnected mult...
A generalized algorithm has been derived for the execution of the Cooley-Tukey FFT algorithm on a di...
In this paper, the problem of computing a one-dimensional FFT on a c-dimensional torus multicomputer...
In high-performance computing on distributed-memory systems, communication often represents a signif...
This paper demonstrates the first tera-scale performance of IntelR © Xeon Phi TM coprocessors on 1D ...
AbstractThe development of the fast Fourier transform (FFT) and its numerous variants in the past 30...
In this paper we propose a new approach to the study of the communication requirements of distribute...
This paper addresses the problem of monodimensional (1D) FFT parallel computation on constant-valenc...
This paper presents a new and optimal parallel implementation of multidimensional fast Fourier trans...
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Differe...
The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in...
In this paper, the computation of a one-dimensional FFT on a c-dimensional torus multicomputer is an...
In this paper we propose a new approach to the study of the communication requirements of distribute...
Abstract. This paper introduces a formal framework for automatically generating performance optimize...
AbstractThe present paper begins with a survey of various up-to-date parallel 3-D FFT algorithms and...
In this work, we propose parallel FFT algorithms, for medium-to-coarse grain hypercubeconnected mult...
A generalized algorithm has been derived for the execution of the Cooley-Tukey FFT algorithm on a di...
In this paper, the problem of computing a one-dimensional FFT on a c-dimensional torus multicomputer...