Despite the success of parallel architectures and domain-specific accelerators in boosting the performance of emerging parallel workloads, contemporary computer organizations still face the bottleneck of data movement between processors and the main memory. Processing-in-memory (PIM) architectures, especially those designs integrating compute logics near DRAM memory banks, are promising to address this bottleneck. However, such an in-DRAM near-bank integration faces hardware and software design challenges in performance, area overheads, architecture complexity, and programmability.To address these challenges, this dissertation focuses on developing efficient hardware and software solutions for in-DRAM near-bank computing. First, this disser...
Many high performance applications run well below the peak arithmetic performance of the underlying...
International audienceToday computing centric von Neumann architectures face strong limitations in t...
Many data-intensive applications exhibit poor temporal and spatial locality and perform poorly on co...
The explosive increase in data volume in emerging applications poses grand challenges to computing s...
As the performance of DRAM devices falls more and more behind computing capabilities, the limitation...
General purpose processors and accelerators including system-on-a-chip and graphics processing units...
For the past two decades, the scaling of main memory lags behind the advancement of computation in a...
Recent years have witnessed a rapid growth in the amount of generated data, owing to the emergence o...
Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally m...
While the compute part keeping scaling for decades, it becomes more and more difficult for the memor...
The limitations of DRAM technology in terms of energy consumption and Bandwidth poses a serious prob...
Conventional compute and memory systems scaling to achieve higher performance and lower cost and pow...
The exponential growth of the dataset size demanded by modern big data applications requires innovat...
Processing-using-memory (PuM) techniques leverage the analog operation of memory cells to perform co...
Many high performance applications run well below the peak arithmetic performance of the underlying ...
Many high performance applications run well below the peak arithmetic performance of the underlying...
International audienceToday computing centric von Neumann architectures face strong limitations in t...
Many data-intensive applications exhibit poor temporal and spatial locality and perform poorly on co...
The explosive increase in data volume in emerging applications poses grand challenges to computing s...
As the performance of DRAM devices falls more and more behind computing capabilities, the limitation...
General purpose processors and accelerators including system-on-a-chip and graphics processing units...
For the past two decades, the scaling of main memory lags behind the advancement of computation in a...
Recent years have witnessed a rapid growth in the amount of generated data, owing to the emergence o...
Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally m...
While the compute part keeping scaling for decades, it becomes more and more difficult for the memor...
The limitations of DRAM technology in terms of energy consumption and Bandwidth poses a serious prob...
Conventional compute and memory systems scaling to achieve higher performance and lower cost and pow...
The exponential growth of the dataset size demanded by modern big data applications requires innovat...
Processing-using-memory (PuM) techniques leverage the analog operation of memory cells to perform co...
Many high performance applications run well below the peak arithmetic performance of the underlying ...
Many high performance applications run well below the peak arithmetic performance of the underlying...
International audienceToday computing centric von Neumann architectures face strong limitations in t...
Many data-intensive applications exhibit poor temporal and spatial locality and perform poorly on co...