As the complexity of machines and architectures has increased, performance tuning has become more challenging, leading to the failure of general compilers to generate the best possible optimized code. Expert performance programmers can often hand-write code that outperforms compiler-optimized low-level code by an order of magnitude. At the same time, the complexity of programs has also increased, with modern programs built on a variety of abstraction layers to manage complexity, yet these layers hinder efforts at optimization. In fact, it is common to lose one or two additional orders of magnitude in performance when going from a low-level language such as Fortran or C to a high-level language like Python, Ruby, or Matlab.General purpose co...
Computational GRIDs potentially offer low-cost, readily available, and large-scale high-performance ...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
In high-performance computing, excellent node-level performance is required for the efficient use of...
As the demand increases for high performance and power efficiency in modern computer runtime systems...
Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple d...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Parallel programming is a demanding task for developers partly because achieving scalable parallel s...
Structured parallel programming is one of the possible solutions to exploit Programmability, Portab...
Structured parallel programming is one of the possible solutions to exploit Programmability, Portab...
Developing high-performance software is a difficult task that requires the use of low-level, archite...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
The recent transformation from an environment where gains in computational performance came from inc...
Abstract. I consider the problem of the domain-specific optimization of programs. I review different...
The recent transformation from an environment where gains in computational performance came from inc...
The number of transistors as well as the frequency of processors have followed Moore's law for the p...
Computational GRIDs potentially offer low-cost, readily available, and large-scale high-performance ...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
In high-performance computing, excellent node-level performance is required for the efficient use of...
As the demand increases for high performance and power efficiency in modern computer runtime systems...
Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple d...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Parallel programming is a demanding task for developers partly because achieving scalable parallel s...
Structured parallel programming is one of the possible solutions to exploit Programmability, Portab...
Structured parallel programming is one of the possible solutions to exploit Programmability, Portab...
Developing high-performance software is a difficult task that requires the use of low-level, archite...
The relentless demands for improvements in the compute throughput, and energy efficiency have driven...
The recent transformation from an environment where gains in computational performance came from inc...
Abstract. I consider the problem of the domain-specific optimization of programs. I review different...
The recent transformation from an environment where gains in computational performance came from inc...
The number of transistors as well as the frequency of processors have followed Moore's law for the p...
Computational GRIDs potentially offer low-cost, readily available, and large-scale high-performance ...
While parallel computing offers an attractive perspective for the future, developing efficient paral...
In high-performance computing, excellent node-level performance is required for the efficient use of...