Development of scalable application codes requires an understanding and exploitation of the locality and parallelism in the computation. This is typically achieved through op-timizations by the programmer to match the application characteristics to the architectural features exposed by the parallel programming model. Partitioned address space program-ming models such as MPI foist a process-centric view of the parallel system, increasing the complexity of parallel programming. Typical global address space models provide a shared memory view that greatly simplifies programming. But the simplified models abstract away the locality information, precluding optimized implementations. In this work, we present techniques to reorganize program execu...
Data-parallel languages, such as H scIGH P scERFORMANCE F scORTRAN or F scORTRAN D, provide a machin...
Journal PaperCurrent microprocessors incorporate techniques to exploit instruction-level parallelism...
Scientific applications that operate on large data sets require huge amount of computation power and ...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
Abstract — The development of efficient parallel out-of-core applications is often tedious, because ...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in seque...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
We articulate the need for managing (data) locality automatically rather than leaving it to the prog...
The goal of this dissertation is to give programmers the ability to achieve high performance by focu...
On recent high-performance multiprocessors, there is a potential conflict between the goals of achie...
While the past research discussed several advantages of multipro-cessor-system-on-a-chip (MPSOC) arc...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Data-parallel languages, such as H scIGH P scERFORMANCE F scORTRAN or F scORTRAN D, provide a machin...
Journal PaperCurrent microprocessors incorporate techniques to exploit instruction-level parallelism...
Scientific applications that operate on large data sets require huge amount of computation power and ...
This work explores the tradeoffs of the memory system of a new massively parallel multiprocessor in ...
Abstract — The development of efficient parallel out-of-core applications is often tedious, because ...
This paper describes a technique for improving the data ref-erence locality of parallel programs usi...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Increased programmability for concurrent applications in distributed systems requires automatic supp...
The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in seque...
grantor: University of TorontoThis dissertation proposes and evaluates compiler techniques...
We articulate the need for managing (data) locality automatically rather than leaving it to the prog...
The goal of this dissertation is to give programmers the ability to achieve high performance by focu...
On recent high-performance multiprocessors, there is a potential conflict between the goals of achie...
While the past research discussed several advantages of multipro-cessor-system-on-a-chip (MPSOC) arc...
This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/16...
Data-parallel languages, such as H scIGH P scERFORMANCE F scORTRAN or F scORTRAN D, provide a machin...
Journal PaperCurrent microprocessors incorporate techniques to exploit instruction-level parallelism...
Scientific applications that operate on large data sets require huge amount of computation power and ...