The Near Memory Processor (NMP) is a multithreaded vector processor integrated with the memory controller. The NMP works subordinately upon requests from the main processors. The NMP is complementary to the conventional superscalar processors and it is optimized for the bandwidth bounded applications and bit manipulation workloads. A program addressable memory in the NMP, Scratchpad provides an effectively large register set to hold vectors, streams and frequently accessed values. Avoiding saving and restoring the vector registers during context switch, the scratchpad reduces the overhead of the multithreading and enables a simple NMP architectural design. We design an instruction set that includes vector, streaming and bit manipulation ins...
As the performance of DRAM devices falls more and more behind computing capabilities, the limitation...
Computer engineering is advancing rapidly. For 55 years, the performance of integrated circuits has ...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
The Near Memory Processor (NMP) is a multithreaded vector processor integrated with the memory contr...
Many important scientific and engineering applications execute sub-optimally on current commodity pr...
100 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2007.In the architectural aspect, ...
The conventional approach of moving data to the CPU for computation has become a significant perform...
The conventional approach of moving data to the CPU for computation has become a significant perform...
This paper proposes both software and hardware mechanisms based on the near-memory processing (NMP) ...
The cost of transferring data between the off-chip memory system and compute unit is the fundamental...
Previous work has demonstrated soft-core vector processors in FPGAs can be applied to speed up data-...
The disparity between microprocessor clock frequencies and memory latency is a primary reason why ma...
Real-world applications are now processing big-data sets, often bottlenecked by the data movement be...
The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient archit...
Present-day parallel computers often face the problems of large software overheads for process switc...
As the performance of DRAM devices falls more and more behind computing capabilities, the limitation...
Computer engineering is advancing rapidly. For 55 years, the performance of integrated circuits has ...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
The Near Memory Processor (NMP) is a multithreaded vector processor integrated with the memory contr...
Many important scientific and engineering applications execute sub-optimally on current commodity pr...
100 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2007.In the architectural aspect, ...
The conventional approach of moving data to the CPU for computation has become a significant perform...
The conventional approach of moving data to the CPU for computation has become a significant perform...
This paper proposes both software and hardware mechanisms based on the near-memory processing (NMP) ...
The cost of transferring data between the off-chip memory system and compute unit is the fundamental...
Previous work has demonstrated soft-core vector processors in FPGAs can be applied to speed up data-...
The disparity between microprocessor clock frequencies and memory latency is a primary reason why ma...
Real-world applications are now processing big-data sets, often bottlenecked by the data movement be...
The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient archit...
Present-day parallel computers often face the problems of large software overheads for process switc...
As the performance of DRAM devices falls more and more behind computing capabilities, the limitation...
Computer engineering is advancing rapidly. For 55 years, the performance of integrated circuits has ...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...