Massively parallel computer systems, having thousands of identical processors operating in SIMD mode, hold the promise of delivering cost effective computing alternatives for many important problems in scientific computing. Computational linear algebra is of fundamental importance to a large class of compute intensive algorithms. This paper discusses the implementation and performance of the computational BLAS kernels in a data-parallel setting. Two different programming languages are compared and several compiler issues are discussed. Keywords: Data-parallel, Fortran 90, SIMD, BLAS, LAPACK 1 Data-parallel programming This paper discusses the data-parallel programming model applied to efficient implementation of basic computational kernel...