Applications in various fields, such as machine learning, scientific computing and signal/image processing, need to deal with real-world input datasets. Such input datasets are usually discrete samples of slow-changing, continuous data of physical phenomena, like temperature maps and images. Due to the continuous nature of the physical phenomena, these datasets often contain data points with similar or even identical values. Eventually, it results in repeated operations performed on the same or similar data points, i.e. redundant computation. Redundant computation can be exploited to improve the energy/power efficiency and performance of processors, especially when the benefits from process technology scaling and power scaling keep diminish...
Energy consumption is one of the top challenges for achieving the next generation of supercomputing....
Sparse and irregular computations constitute a large fraction of applications in the data-intensive ...
This paper addresses the efficient exploitation of task-level parallelism, present in many dense lin...
Applications in various fields, such as machine learning, scientific computing and signal/image proc...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
Many modern workloads such as multimedia, recognition, mining, search, vision, etc. possess the char...
Moore’s law is dead. The physical and economic principles that enabled an exponential rise in transi...
The saturation of single-thread performance, along with the advent of the power wall, has resulted i...
Faster and more efficient hardware is needed to handle the rapid growth of Big Data processing. Appl...
To avoid immoderate power consumption, the chip industry has shifted away from highperformance singl...
"Inexact computing" provides an opportunity for exploiting application characteristics to improve en...
Thread parallel hardware, as the Graphics Processing Units (GPUs), greatly outperform CPUs in provid...
Data analytics for streaming sensor data brings challenges for the resource efficiency of algorithms...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
Energy consumption is one of the top challenges for achieving the next generation of supercomputing....
Sparse and irregular computations constitute a large fraction of applications in the data-intensive ...
This paper addresses the efficient exploitation of task-level parallelism, present in many dense lin...
Applications in various fields, such as machine learning, scientific computing and signal/image proc...
It is commonplace for graphics processing units or GPUs today to render extremely complex 3D scenes ...
With processor clock speeds having stagnated, parallel computing architectures have achieved a break...
Many modern workloads such as multimedia, recognition, mining, search, vision, etc. possess the char...
Moore’s law is dead. The physical and economic principles that enabled an exponential rise in transi...
The saturation of single-thread performance, along with the advent of the power wall, has resulted i...
Faster and more efficient hardware is needed to handle the rapid growth of Big Data processing. Appl...
To avoid immoderate power consumption, the chip industry has shifted away from highperformance singl...
"Inexact computing" provides an opportunity for exploiting application characteristics to improve en...
Thread parallel hardware, as the Graphics Processing Units (GPUs), greatly outperform CPUs in provid...
Data analytics for streaming sensor data brings challenges for the resource efficiency of algorithms...
The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute thread...
Energy consumption is one of the top challenges for achieving the next generation of supercomputing....
Sparse and irregular computations constitute a large fraction of applications in the data-intensive ...
This paper addresses the efficient exploitation of task-level parallelism, present in many dense lin...