AbstractIt is proposed to enhance and simplify the programming of a two dimensional (2-D) torus (and mesh) connected SIMD array of simple processing elements (PEs) by introducing two dedicated communication registers in each PE. A new SIMD algorithm to transpose a matrix using only two buffers at each PE is described. A method is proposed to effectively realize large number of arbitrary, one-to-one, personalized, and concurrent communication between the PEs, by suitably repeating the matrix transpose algorithm. Implementation of several image processing tasks of shift-variant nature, such as hough transform, histogram, median filters, which involve such communication, is enhanced by this approach. The dynamic behavior of such a SIMD impleme...
Abstract—Current SIMD extensions have probed to be effective for incrementing the performance of gen...
Algorithms for interpreting the motion of edge features in an image sequence in terms of the positio...
A SIMD scheme for parallelization of the 2-D array operation M(x) = (D×A + B×I + V) x is developed f...
A general radix-2 FFT algorithm was recently developed and implemented for Modern Single Instruction...
SIMD (single instruction multiple data)-type processors have been found very efficient in image proc...
Existing techniques for mapping image data onto the processors of a SIMD machine are suitable for al...
We develop efficient algorithms for low and intermediate level image processing on the scan line arr...
This paper examines the applicability of fine-grained tree-structured SIMD machines, which are amena...
In this paper, we examine the implementation of two middle-level image understanding tasks on fine-g...
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and Pentium IV class...
This paper presents an SIMD machine which has been tuned to execute low-level vision algorithms empl...
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and Pentium IV class...
We explore vectorised implementations, exploiting single instruction multiple data (SIMD) CPU instru...
An heterogeneous Multiple-SIMD (M-SIMD) architecture is used to analyse image sequences by integrati...
algorithm is reported, and demonstrated in use for interpreting medical images. The SIMD machine act...
Abstract—Current SIMD extensions have probed to be effective for incrementing the performance of gen...
Algorithms for interpreting the motion of edge features in an image sequence in terms of the positio...
A SIMD scheme for parallelization of the 2-D array operation M(x) = (D×A + B×I + V) x is developed f...
A general radix-2 FFT algorithm was recently developed and implemented for Modern Single Instruction...
SIMD (single instruction multiple data)-type processors have been found very efficient in image proc...
Existing techniques for mapping image data onto the processors of a SIMD machine are suitable for al...
We develop efficient algorithms for low and intermediate level image processing on the scan line arr...
This paper examines the applicability of fine-grained tree-structured SIMD machines, which are amena...
In this paper, we examine the implementation of two middle-level image understanding tasks on fine-g...
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and Pentium IV class...
This paper presents an SIMD machine which has been tuned to execute low-level vision algorithms empl...
Streaming SIMD Extensions (SSE) is a unique feature embedded in the Pentium III and Pentium IV class...
We explore vectorised implementations, exploiting single instruction multiple data (SIMD) CPU instru...
An heterogeneous Multiple-SIMD (M-SIMD) architecture is used to analyse image sequences by integrati...
algorithm is reported, and demonstrated in use for interpreting medical images. The SIMD machine act...
Abstract—Current SIMD extensions have probed to be effective for incrementing the performance of gen...
Algorithms for interpreting the motion of edge features in an image sequence in terms of the positio...
A SIMD scheme for parallelization of the 2-D array operation M(x) = (D×A + B×I + V) x is developed f...