The objective of this research is to improve the performance of sparse problems that have a wide range of applications but still, suffer from serious challenges when running on modern computers. In summary, the challenges include the underutilization of available memory bandwidth because of lack of spatial locality, dependencies in computation, or slow mechanisms for decompressing the sparse data, and the underutilization of concurrent compute engines because of the distribution of non-zero values in sparse data. Our key insight to address the aforementioned challenges is that based on the type of the problem, we either use an intelligent reduction tree near memory to process data while gathering them from random locations of memory, transf...