Many algorithms and applications in scientific computing exhibit irregular access patterns as consecutive accesses are dependent on the structure of the data being processed and as such cannot be known a priori. This manifests itself as a lack of temporal and spatial locality meaning these applications often perform poorly in traditional processor cache hierarchies. This thesis demonstrates that heterogeneous architectures containing Field Programmable Gate Arrays (FPGAs) alongside traditional processors can improve memory access throughput by 2-3x by using the FPGA to insert data directly into the processor cache, eliminating costly cache misses. When fetching data to be processed directly on the FPGA, scatter-gather Direct Memory Acce...
Moore's Law has helped Field Programmable Gate Arrays (FPGAs) scale continuously in speed, capacity ...
Using FPGA-based acceleration of high-performance computing (HPC) applications to reduce energy and ...
The saturation of single-thread performance, along with the advent of the power wall, has resulted i...
Over the past few years there has been increased interest in building custom computing machines (CCM...
With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming ...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
The decreasing cost of DRAM has made possible and grown the use of in-memory databases. However, mem...
Accessing the memory efficiently to keep up with the data processing rate is a well known problem in...
Conventional compute and memory systems scaling to achieve higher performance and lower cost and pow...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
The last two decade has witnessed two opposing hardware trends where the DRAM capacity and the acces...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
This dissertation investigates the communication optimization for customizable domain-specific compu...
FPGAs rely on massive datapath parallelism to accelerate applications even with a low clock frequenc...
For efficient acceleration on FPGA, it is essential for external memory to match the throughput of t...
Moore's Law has helped Field Programmable Gate Arrays (FPGAs) scale continuously in speed, capacity ...
Using FPGA-based acceleration of high-performance computing (HPC) applications to reduce energy and ...
The saturation of single-thread performance, along with the advent of the power wall, has resulted i...
Over the past few years there has been increased interest in building custom computing machines (CCM...
With computing systems becoming ubiquitous, numerous data sets of extremely large size are becoming ...
The performance gap between CPUs, and memory memory has diverged significantly since the 1980's maki...
The decreasing cost of DRAM has made possible and grown the use of in-memory databases. However, mem...
Accessing the memory efficiently to keep up with the data processing rate is a well known problem in...
Conventional compute and memory systems scaling to achieve higher performance and lower cost and pow...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
The last two decade has witnessed two opposing hardware trends where the DRAM capacity and the acces...
Graphics processing units (GPUs) have become prevalent in modern computing systems. While their high...
This dissertation investigates the communication optimization for customizable domain-specific compu...
FPGAs rely on massive datapath parallelism to accelerate applications even with a low clock frequenc...
For efficient acceleration on FPGA, it is essential for external memory to match the throughput of t...
Moore's Law has helped Field Programmable Gate Arrays (FPGAs) scale continuously in speed, capacity ...
Using FPGA-based acceleration of high-performance computing (HPC) applications to reduce energy and ...
The saturation of single-thread performance, along with the advent of the power wall, has resulted i...