The growing gap between sustained and peak performance for scientific applications is a well-known problem in high end computing. The recent development of parallel vector systems offers the potential to bridge this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of scientific computing areas. First, we present the performance of a microbenchmark suite that examines low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computin...
Abstract The last decade has witnessed a rapid proliferation of superscalar cache-based microprocess...
In-memory column-store database systems are state of the art for the efficient processing of analyti...
Scientific programs are typically characterized as floating-point intensive loop-dominated tasks wit...
The growing gap between sustained and peak performance for scientific applications has become a well...
The growing gap between sustained and peak performance for scientific applications has become a well...
The growing gap between sustained and peak performance for scientific applications is a well-known p...
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to bu...
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to b...
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to bu...
The growing gap between sustained and peak performance for scientific applications is a well-known p...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to b...
Abstract. The last decade has witnessed a rapid proliferation of superscalar cache-based microproces...
In this paper, we use execution-driven simulation to study and compare vector processing performance...
The last decade has witnessed a rapid proliferation ofsuperscalar cache-based microprocessors to bui...
Abstract The last decade has witnessed a rapid proliferation of superscalar cache-based microprocess...
In-memory column-store database systems are state of the art for the efficient processing of analyti...
Scientific programs are typically characterized as floating-point intensive loop-dominated tasks wit...
The growing gap between sustained and peak performance for scientific applications has become a well...
The growing gap between sustained and peak performance for scientific applications has become a well...
The growing gap between sustained and peak performance for scientific applications is a well-known p...
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to bu...
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to b...
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to bu...
The growing gap between sustained and peak performance for scientific applications is a well-known p...
This paper presents an experimental study on cache memory designs for vector computers. We use an ex...
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to b...
Abstract. The last decade has witnessed a rapid proliferation of superscalar cache-based microproces...
In this paper, we use execution-driven simulation to study and compare vector processing performance...
The last decade has witnessed a rapid proliferation ofsuperscalar cache-based microprocessors to bui...
Abstract The last decade has witnessed a rapid proliferation of superscalar cache-based microprocess...
In-memory column-store database systems are state of the art for the efficient processing of analyti...
Scientific programs are typically characterized as floating-point intensive loop-dominated tasks wit...