Abstract. The automatic parallelization of sequential applications is a great challenge for current compiler technology. The partitioning of a se-quential application into parallel programs that can be executed concur-rently on a given parallel architecture is a complex and time-consuming undertaking. In addition, the programmer is often responsible for defin-ing a good partitioning that takes into account the properties of both the program and the architecture. This paper proposes a new fully au-tomated partitioning algorithm driven by an intermediate representa-tion of the sequential application in terms of the domain-independent concept-level kernels (e.g., induction, reduction, recurrence) recognized by the XARK compiler framework. Such...
This is a post-peer-review, pre-copyedit version of an article published in ACM Transactions on Prog...
In order to utilize parallel computers, four approaches, broadly speaking, to the provision of paral...
On current multiprocessor architectures one must carefully distribute data in memory in order to ach...
The widespread use of multicore processors is not a consequence of significant advances in parallel ...
The widespread use of multicore processors is not a consequence of significant advances in parallel ...
Parallel computing hardware is affordable and accessible, yet parallel programming is not as widespr...
In this paper we present substantially improved thread partitioning algorithms for modern implicitly...
This manuscript summarizes the main ideas introduced in [1]. We propose a compiler that automaticall...
In this paper we present substantially improved thread partitioning algorithms for modern implicitly...
εm is a high-level programming system which puts parallelism within the reach of scientists who are ...
Current parallelization techniques, mostly based on data dependence analysis, are primarily used to ...
There is a trend towards using accelerators to increase performance and energy efficiency of general...
Several researchers have looked into various issues related to automatic parallelization of sequenti...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
The general problem studied is that of segmenting or partitioning programs for distribution across a...
This is a post-peer-review, pre-copyedit version of an article published in ACM Transactions on Prog...
In order to utilize parallel computers, four approaches, broadly speaking, to the provision of paral...
On current multiprocessor architectures one must carefully distribute data in memory in order to ach...
The widespread use of multicore processors is not a consequence of significant advances in parallel ...
The widespread use of multicore processors is not a consequence of significant advances in parallel ...
Parallel computing hardware is affordable and accessible, yet parallel programming is not as widespr...
In this paper we present substantially improved thread partitioning algorithms for modern implicitly...
This manuscript summarizes the main ideas introduced in [1]. We propose a compiler that automaticall...
In this paper we present substantially improved thread partitioning algorithms for modern implicitly...
εm is a high-level programming system which puts parallelism within the reach of scientists who are ...
Current parallelization techniques, mostly based on data dependence analysis, are primarily used to ...
There is a trend towards using accelerators to increase performance and energy efficiency of general...
Several researchers have looked into various issues related to automatic parallelization of sequenti...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
The general problem studied is that of segmenting or partitioning programs for distribution across a...
This is a post-peer-review, pre-copyedit version of an article published in ACM Transactions on Prog...
In order to utilize parallel computers, four approaches, broadly speaking, to the provision of paral...
On current multiprocessor architectures one must carefully distribute data in memory in order to ach...