Performance tuning of non-blocking threads is based on graph partitioning algorithms that create serial code blocks from dependence graphs. Previously existing algorithms are directed toward deadlock-avoidance and maximisation of run-length. The latter criterion often generates a high synchronisation overhead. This paper presents a partitioning algorithm for dependence graphs that uses a heuristic to determine a cost-efficient solution based on an architecture-dependent cost function. We present empirical results based on benchmark programs that were compiled with MIT's Id compiler, extended by our architecture-dependent partitioning algorithm. The results demonstrate a reduction in software overhead with our architecture-dependent par...
[[abstract]]The data dependence graph is very useful to parallel algorithm design. In this paper, ap...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
An algorithm can be modeled as an index set and a set of dependence vectors. Each index vector in th...
Performance tuning of non-blocking threads is based on graph partitioning algorithms that create ser...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
Abstract Existing partitioning algorithms provide limited support for load balancing simulations tha...
Existing partitioning algorithms provide limited support for load balancing simulations that are per...
Existing partitioning algorithms provide limited support for load balancing simulations that are per...
This paper describes a method of analysis for detecting and minimizing memory latency using a direct...
The ordering of operations in a data flow program is not specified by the programmer, but is implied...
Three related problems, among others, are faced when trying to execute an algorithm on a parallel ma...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The topic of intermediate languages for optimizing and parallelizing compilers has received much at...
In this paper we present substantially improved thread partitioning algorithms for modern implicitly...
In this paper we present substantially improved thread partitioning algorithms for modern implicitly...
[[abstract]]The data dependence graph is very useful to parallel algorithm design. In this paper, ap...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
An algorithm can be modeled as an index set and a set of dependence vectors. Each index vector in th...
Performance tuning of non-blocking threads is based on graph partitioning algorithms that create ser...
In this paper, we present an efficient framework for intraprocedural performance based program parti...
Abstract Existing partitioning algorithms provide limited support for load balancing simulations tha...
Existing partitioning algorithms provide limited support for load balancing simulations that are per...
Existing partitioning algorithms provide limited support for load balancing simulations that are per...
This paper describes a method of analysis for detecting and minimizing memory latency using a direct...
The ordering of operations in a data flow program is not specified by the programmer, but is implied...
Three related problems, among others, are faced when trying to execute an algorithm on a parallel ma...
Current high performance computing architectures are composed of large shared memory NUMA nodes, amo...
The topic of intermediate languages for optimizing and parallelizing compilers has received much at...
In this paper we present substantially improved thread partitioning algorithms for modern implicitly...
In this paper we present substantially improved thread partitioning algorithms for modern implicitly...
[[abstract]]The data dependence graph is very useful to parallel algorithm design. In this paper, ap...
Pattern matching on large graphs is the foundation for a variety of application domains. The continu...
An algorithm can be modeled as an index set and a set of dependence vectors. Each index vector in th...