Spatial processing of sparse, irregular, double-precision floating-point computation using a single field-programmable gate array (FPGA) enables up to an order of magnitude speedup (mean 2.8× speedup) over a conventional microprocessor for the SPICE circuit simulator. We develop a parallel, FPGA-based, heterogeneous architecture customized for accelerating the SPICE simulator to deliver this speedup. To properly parallelize the complete simulator, we decompose SPICE into its three constituent phases-model evaluation, sparse matrix-solve, and iteration control-and customize a spatial architecture for each phase independently. Our heterogeneous FPGA organization mixes very large instruction word, dataflow and streaming architectures into a co...
© 2016 ACM. TinySPICE was a SPICE simulator on GPU developed to achieve dramatic speedups in statist...
As part of our effort to parallelise SPICE simulations over multiple FPGAs, we present a parallel FP...
Recently, FPGAs have been integrated into HPC clusters in order to boost their computational perform...
Spatial processing of sparse, irregular, double-precision floating-point computation using a single ...
ii Spatial processing of sparse, irregular floating-point computation using a single FPGA enables up...
Abstract—Single-FPGA spatial implementations can provide an order of magnitude speedup over sequenti...
Fine-grained dataflow processing of sparse matrix-solve computation (Ax = b) in the SPICE circuit si...
Fine-grained dataflow processing of sparse Matrix-Solve computation (A~x = ~b) in the SPICE circuit ...
SPICE, from the University of California, at Berkeley, is the de facto world standard for circuit si...
Abstract—Many stand-alone, FPGA-based accelerators sepa-rate the implementation of a computation int...
Many stand-alone, FPGA-based accelerators separate the implementation of a computation into two comp...
Automated code generation and performance tuning tech-niques for concurrent architectures such as GP...
In this paper, we developed a simulation-based architecture evaluation framework for field-programma...
Automated code generation and performance tuning techniques for concurrent architectures such as GPU...
Due to ever increasing complexity of circuits, EDA tools and algorithms are demanding more computati...
© 2016 ACM. TinySPICE was a SPICE simulator on GPU developed to achieve dramatic speedups in statist...
As part of our effort to parallelise SPICE simulations over multiple FPGAs, we present a parallel FP...
Recently, FPGAs have been integrated into HPC clusters in order to boost their computational perform...
Spatial processing of sparse, irregular, double-precision floating-point computation using a single ...
ii Spatial processing of sparse, irregular floating-point computation using a single FPGA enables up...
Abstract—Single-FPGA spatial implementations can provide an order of magnitude speedup over sequenti...
Fine-grained dataflow processing of sparse matrix-solve computation (Ax = b) in the SPICE circuit si...
Fine-grained dataflow processing of sparse Matrix-Solve computation (A~x = ~b) in the SPICE circuit ...
SPICE, from the University of California, at Berkeley, is the de facto world standard for circuit si...
Abstract—Many stand-alone, FPGA-based accelerators sepa-rate the implementation of a computation int...
Many stand-alone, FPGA-based accelerators separate the implementation of a computation into two comp...
Automated code generation and performance tuning tech-niques for concurrent architectures such as GP...
In this paper, we developed a simulation-based architecture evaluation framework for field-programma...
Automated code generation and performance tuning techniques for concurrent architectures such as GPU...
Due to ever increasing complexity of circuits, EDA tools and algorithms are demanding more computati...
© 2016 ACM. TinySPICE was a SPICE simulator on GPU developed to achieve dramatic speedups in statist...
As part of our effort to parallelise SPICE simulations over multiple FPGAs, we present a parallel FP...
Recently, FPGAs have been integrated into HPC clusters in order to boost their computational perform...