AbstractIn this paper, a processing element (PE) is characterized by its computation bandwidth, I/O bandwidth, and the size of its local memory. In carrying out a computation, a PE is said to be balanced if the computing time equals the I/O time. Consider a balanced PE for some computation. Suppose that the computation band-width of the PE is increased by a factor of α relative to its I/O bandwidth. Then when carrying out the same computation the PE will be imbalanced; i.e., it will have to wait for I/O. A standard method of avoiding this I/O bottleneck is to reduce the overall I/O requirement of the PE by increasing the size of its local memory. This paper addresses the question of by how much the PE's local memory must be enlarged in orde...
Contains fulltext : 240925.pdf (Publisher’s version ) (Open Access)16 p
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
The communication and synchronization overhead inherent in parallel processing can lead to situation...
AbstractIn this paper, a processing element (PE) is characterized by its computation bandwidth, I/O ...
AbstractA processor is balanced in carrying out a computation if its computing time equals its I/O t...
The issues to be addressed here are those of balance'' in machine architecture. By this, we mean how...
In this paper we propose a new approach to the study of the communication requirements of distribute...
Designers of parallel computers have to decide how to apportion a machine's resources between p...
This paper discusses the importance of memory access optimizations which are shown to be highly effe...
Most of the researches in algorithms are for reducing computational time complexity. Such researches...
AbstractThe PRAM model of parallel computation is examined with respect to wordsize, the number of b...
The growing importance and interest in parallel processing within Computer Sciences are undeniable, ...
Recently there has been an increasing interest in models of parallel computation that account for th...
This paper shows that a fat-pyramid of area Theta(A) built from processors of size lg A requires onl...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
Contains fulltext : 240925.pdf (Publisher’s version ) (Open Access)16 p
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
The communication and synchronization overhead inherent in parallel processing can lead to situation...
AbstractIn this paper, a processing element (PE) is characterized by its computation bandwidth, I/O ...
AbstractA processor is balanced in carrying out a computation if its computing time equals its I/O t...
The issues to be addressed here are those of balance'' in machine architecture. By this, we mean how...
In this paper we propose a new approach to the study of the communication requirements of distribute...
Designers of parallel computers have to decide how to apportion a machine's resources between p...
This paper discusses the importance of memory access optimizations which are shown to be highly effe...
Most of the researches in algorithms are for reducing computational time complexity. Such researches...
AbstractThe PRAM model of parallel computation is examined with respect to wordsize, the number of b...
The growing importance and interest in parallel processing within Computer Sciences are undeniable, ...
Recently there has been an increasing interest in models of parallel computation that account for th...
This paper shows that a fat-pyramid of area Theta(A) built from processors of size lg A requires onl...
Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been impleme...
Contains fulltext : 240925.pdf (Publisher’s version ) (Open Access)16 p
The objective of high performance computing (HPC) is to ensure that the computational power of hardw...
The communication and synchronization overhead inherent in parallel processing can lead to situation...