We describe the design and FPGA implementation of a 3D torus network (TNW) to provide nearest-neighbor communications between commodity multi-core processors. The aim of this project is to build up tightly interconnected and scalable parallel systems for scientific computing. The design includes the VHDL code to implement on latest FPGA devices a network processor, which can be accessed by the CPU through a PCIe interface and which controls the external PHYs of the physical links. Moreover, a Linux driver and a library implementing custom communication APIs are provided. The TNW has been successfully integrated in two recent parallel machine projects, QPACE and AuroraScience. We describe some details of the porting of the TNW for th
Applications running on custom architectures with hundreds of specialized processing elements (PEs) ...
The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in...
A 2-Dimensional mesh has low design complexities and very good match to the rectangular processor ar...
We describe the design and FPGA implementation of a 3D torus network (TNW) to provide nearest-neighb...
In this chapter we describe the architecture of a torus interconnect and its implementation on FPGAs...
FPGA-Centric Clusters (FCCs) with the FPGAs directly linked through their Multi-Gigabit Transceivers...
With the increasing capacity of FPGAs following the Moore's law, it is possible to build in a single...
Thesis (M.S.)--Boston UniversityApplications that require highly parallel computing along with low l...
Abstract. Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpos...
High Performance Computing (HPC) has matured to where it is an essential third pillar, along with th...
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Differe...
The scalable simulation of neuron communication needs a largeamount of computing resources. The high...
AbstractThe research article presents the simulation and FPGA synthesis of mesh, torus and ring Netw...
Deep learning a large scalable network architecture based on neural network. It is currently an extr...
A key infrastructure required to make heterogeneous clusters easier to use is a standard communicati...
Applications running on custom architectures with hundreds of specialized processing elements (PEs) ...
The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in...
A 2-Dimensional mesh has low design complexities and very good match to the rectangular processor ar...
We describe the design and FPGA implementation of a 3D torus network (TNW) to provide nearest-neighb...
In this chapter we describe the architecture of a torus interconnect and its implementation on FPGAs...
FPGA-Centric Clusters (FCCs) with the FPGAs directly linked through their Multi-Gigabit Transceivers...
With the increasing capacity of FPGAs following the Moore's law, it is possible to build in a single...
Thesis (M.S.)--Boston UniversityApplications that require highly parallel computing along with low l...
Abstract. Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpos...
High Performance Computing (HPC) has matured to where it is an essential third pillar, along with th...
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Differe...
The scalable simulation of neuron communication needs a largeamount of computing resources. The high...
AbstractThe research article presents the simulation and FPGA synthesis of mesh, torus and ring Netw...
Deep learning a large scalable network architecture based on neural network. It is currently an extr...
A key infrastructure required to make heterogeneous clusters easier to use is a standard communicati...
Applications running on custom architectures with hundreds of specialized processing elements (PEs) ...
The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in...
A 2-Dimensional mesh has low design complexities and very good match to the rectangular processor ar...