While programmable accelerators such as application-specific processors and reconfigurable architectures can dramatically speed up compute-intensive kernels of an application, application performance can still be severely limited by the communication between processors. To minimize the communication overhead, a shared memory such as a scratchpad memory may be employed between the main processor and the accelerator coprocessor. However, this setup poses a significant challenge to the main processor, which now must manage data on the scratchpad explicitly, resulting in superfluous data copying due to the inflexibility of a scratchpad. In this article, we present an enhancement of a scratchpad, Configurable Range Memory (CRM), whose address ra...
Abstract—This paper presents a compiler strategy to optimize data accesses in regular array-intensiv...
ABSTRACT This paper presents the first memory allocation scheme for embedded systems having scratch-...
In this paper we address the problem of on-chip mem-ory selection for computationally intensive appl...
Application-specific hardware and reconfigurable processors can dramatically speed up compute-intens...
Scratchpad memory has been introduced as a replacement for cache memory as it improves the performan...
Coarse-Grained Reconfigurable Architecture (CGRA) in a hybrid system can significantly accelerate th...
Abstract—A method to both reduce energy and improve perfor-mance in a processor-based embedded syste...
In this paper, we propose a methodology for energy reduction and performance improvement. The target...
Increasing demand for power-efficient, high-performance computing has spurred a growing number and d...
In order to meet the requirements concerning both performance and energy consumption in embedded sy...
This Thesis focuses on the acceleration of different applications using a run-time reconfigurable ar...
This Thesis focuses on the acceleration of different applications using a run-time reconfigurable ar...
Abstract—We propose a code scratchpad memory (SPM) management technique with demand paging for embed...
Abstract—Exploiting runtime memory access traces can be a complementary approach to compiler optimiz...
Abstract—Exploiting runtime memory access traces can be a complementary approach to compiler optimiz...
Abstract—This paper presents a compiler strategy to optimize data accesses in regular array-intensiv...
ABSTRACT This paper presents the first memory allocation scheme for embedded systems having scratch-...
In this paper we address the problem of on-chip mem-ory selection for computationally intensive appl...
Application-specific hardware and reconfigurable processors can dramatically speed up compute-intens...
Scratchpad memory has been introduced as a replacement for cache memory as it improves the performan...
Coarse-Grained Reconfigurable Architecture (CGRA) in a hybrid system can significantly accelerate th...
Abstract—A method to both reduce energy and improve perfor-mance in a processor-based embedded syste...
In this paper, we propose a methodology for energy reduction and performance improvement. The target...
Increasing demand for power-efficient, high-performance computing has spurred a growing number and d...
In order to meet the requirements concerning both performance and energy consumption in embedded sy...
This Thesis focuses on the acceleration of different applications using a run-time reconfigurable ar...
This Thesis focuses on the acceleration of different applications using a run-time reconfigurable ar...
Abstract—We propose a code scratchpad memory (SPM) management technique with demand paging for embed...
Abstract—Exploiting runtime memory access traces can be a complementary approach to compiler optimiz...
Abstract—Exploiting runtime memory access traces can be a complementary approach to compiler optimiz...
Abstract—This paper presents a compiler strategy to optimize data accesses in regular array-intensiv...
ABSTRACT This paper presents the first memory allocation scheme for embedded systems having scratch-...
In this paper we address the problem of on-chip mem-ory selection for computationally intensive appl...