The speed-up estimation of parallelized code is crucial to efficiently compare different parallelization techniques or task graph transformations. Unfortunately, most of the time, during the parallelization of a specification, the information that can be extracted by profiling the corresponding sequential code (e.g. the most executed paths) are not properly taken into account. In particular, correlating sequential path profiling with the corresponding parallelized code can help in the identification of code hot spots, opening new possibilities for automatic parallelization. For this reason, starting from a well-known profiling technique, the Efficient Path Profiling, we propose a methodology that estimates the speed-up of a parallelized spe...
When a parallel computation is represented in a formalism that imposes series-parallel structure on ...
With the rise of Chip multiprocessors (CMPs), the amount of parallel computing power will increase s...
A parallel program can be represented as a directed acyclic graph. An im-portant performance bound i...
The speed-up estimation of parallelized code is crucial to efficiently compare different paralleliza...
Correctly estimating the speed-up of a parallel embedded application is crucial to efficiently compa...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
Thus far, parallelism at the loop level (or data-parallelism) has been almost exclusively the main t...
Software engineers now face the difficult task of parallelizing serial programs for parallel executi...
In this paper, we describe a model for determining the optimal data and computation decomposition fo...
Abstract—We investigate an automatic method for classifying which regions of sequential programs cou...
Sequential graph algorithms are implemented through ordered execution of tasks to achieve high work ...
With the rise of Chip multiprocessors (CMPs), the amount of parallel computing power will increase s...
Traditional parallelism detection in compilers is performed by means of static analysis and more spe...
During the past decade, the degree of parallelism available in hardware has grown quickly and decisi...
We analyse the inherent performance of parallel software. For this end we use a task graph to model ...
When a parallel computation is represented in a formalism that imposes series-parallel structure on ...
With the rise of Chip multiprocessors (CMPs), the amount of parallel computing power will increase s...
A parallel program can be represented as a directed acyclic graph. An im-portant performance bound i...
The speed-up estimation of parallelized code is crucial to efficiently compare different paralleliza...
Correctly estimating the speed-up of a parallel embedded application is crucial to efficiently compa...
Performance analysis of parallel programs continues to be challenging for programmers. Programmers h...
Thus far, parallelism at the loop level (or data-parallelism) has been almost exclusively the main t...
Software engineers now face the difficult task of parallelizing serial programs for parallel executi...
In this paper, we describe a model for determining the optimal data and computation decomposition fo...
Abstract—We investigate an automatic method for classifying which regions of sequential programs cou...
Sequential graph algorithms are implemented through ordered execution of tasks to achieve high work ...
With the rise of Chip multiprocessors (CMPs), the amount of parallel computing power will increase s...
Traditional parallelism detection in compilers is performed by means of static analysis and more spe...
During the past decade, the degree of parallelism available in hardware has grown quickly and decisi...
We analyse the inherent performance of parallel software. For this end we use a task graph to model ...
When a parallel computation is represented in a formalism that imposes series-parallel structure on ...
With the rise of Chip multiprocessors (CMPs), the amount of parallel computing power will increase s...
A parallel program can be represented as a directed acyclic graph. An im-portant performance bound i...