[[abstract]]A methodology for designing pipelined data-parallel algorithms on multicomputers is studied. The design procedure starts with a sequential algorithm which can be expressed as a nested loop with constant loop-carried dependencies. The procedure's main focus is on partitioning the loop by grouping related iterations together. Grouping is necessary to balance the communication overhead with the available parallelism and to produce pipelined execution patterns, which result in pipelined data-parallel computations. The grouping should satisfy dependence relationships among the iterations and also allow the granularity to be controlled. Various properties of grouping are studied, and methods for generating communication-efficient grou...
Data parallel programming provides a simple and powerful framework for designing parallel algorithms...
This session explores, through the use of formal methods, the “intuition” used in creating a paralle...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...
[[abstract]]A systematic procedure for designing pipelined data-parallel algorithms that are suitabl...
[[abstract]]The basic concept of piplined data-parallel algorithms is introduced by contrasting the ...
Many problems currently require more processor throughput than can be achieved with current single-p...
Current parallelization techniques, mostly based on data dependence analysis, are primarily used to ...
A set of communication operations is defined, which allows a form of task parallelism to be achieved...
The general problem studied is that of segmenting or partitioning programs for distribution across a...
A tool activity diagram is presented. The tool facilitates parallel program development by providing...
This article presents the pipeline communication/interaction pattern for concurrent, parallel and di...
[[abstract]]Efficient methods of partitioning nested for-loops for parallel execution on multicomput...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Research on programming distributed memory multiprocessors has resulted in a well-understood program...
[[abstract]]The data dependence graph (DDG) is a useful tool for the parallelism detection which is ...
Data parallel programming provides a simple and powerful framework for designing parallel algorithms...
This session explores, through the use of formal methods, the “intuition” used in creating a paralle...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...
[[abstract]]A systematic procedure for designing pipelined data-parallel algorithms that are suitabl...
[[abstract]]The basic concept of piplined data-parallel algorithms is introduced by contrasting the ...
Many problems currently require more processor throughput than can be achieved with current single-p...
Current parallelization techniques, mostly based on data dependence analysis, are primarily used to ...
A set of communication operations is defined, which allows a form of task parallelism to be achieved...
The general problem studied is that of segmenting or partitioning programs for distribution across a...
A tool activity diagram is presented. The tool facilitates parallel program development by providing...
This article presents the pipeline communication/interaction pattern for concurrent, parallel and di...
[[abstract]]Efficient methods of partitioning nested for-loops for parallel execution on multicomput...
Due to the character of the original source materials and the nature of batch digitization, quality ...
Research on programming distributed memory multiprocessors has resulted in a well-understood program...
[[abstract]]The data dependence graph (DDG) is a useful tool for the parallelism detection which is ...
Data parallel programming provides a simple and powerful framework for designing parallel algorithms...
This session explores, through the use of formal methods, the “intuition” used in creating a paralle...
Communication overhead in multiprocessor systems, as exemplified by cache coherency traffic and glob...