International audienceThis paper presents a modeling method particularly suited to analyze interactions between Message Passing Interface (MPI) library execution and distributed cache coherence protocol. The method is applied to the Ping-Pong benchmark. In addition to overall performance figures like message exchange latency, it also provides detailed analysis elements such as cache miss counts per variable. It is based on formal modeling where functional aspects and performance aspects are integrated by composition and can be refined independently. A key modeling point is that the cache coherence protocol implies that the duration of an access to a variable is not static but is state-dependent. Our Ping-Pong model allows comparison of diff...
To facilitate programming, most multi-core processors feature automated mechanisms maintaining coher...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
We compare the performance of three major programming models— a load-store cache-coherent shared add...
International audienceThis paper presents a modeling method particularly suited to analyze interacti...
International audienceShared memory MPI communication is an important part of the overall performanc...
Cache coherence is one of the main challenges to tackle when designing a shared-memory multiprocesso...
The range of high-end servers designed and manufactured by Bull includes cache-coherent distributed ...
Collection of computationtal artifacts (source code, scripts, datasets, instructions) for reproducib...
This paper addresses the problem of evaluating the performance of multiprocessor with shared memory ...
International audienceTo facilitate programming, most multi-core processors feature automated mechan...
The use of private caches in a multiprocessor system causes inconsistency of the shared data among t...
We present an analytical model of a cache coherent shared-memory multiprocessor and compare the resu...
In this paper we present a cache coherence protocol for multistage interconnection network (MIN)-bas...
To facilitate programming, most multi-core processors feature automated mechanisms maintaining coher...
We develop an analytical model of multiprocessor with private caches and shared memory and obtain th...
To facilitate programming, most multi-core processors feature automated mechanisms maintaining coher...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
We compare the performance of three major programming models— a load-store cache-coherent shared add...
International audienceThis paper presents a modeling method particularly suited to analyze interacti...
International audienceShared memory MPI communication is an important part of the overall performanc...
Cache coherence is one of the main challenges to tackle when designing a shared-memory multiprocesso...
The range of high-end servers designed and manufactured by Bull includes cache-coherent distributed ...
Collection of computationtal artifacts (source code, scripts, datasets, instructions) for reproducib...
This paper addresses the problem of evaluating the performance of multiprocessor with shared memory ...
International audienceTo facilitate programming, most multi-core processors feature automated mechan...
The use of private caches in a multiprocessor system causes inconsistency of the shared data among t...
We present an analytical model of a cache coherent shared-memory multiprocessor and compare the resu...
In this paper we present a cache coherence protocol for multistage interconnection network (MIN)-bas...
To facilitate programming, most multi-core processors feature automated mechanisms maintaining coher...
We develop an analytical model of multiprocessor with private caches and shared memory and obtain th...
To facilitate programming, most multi-core processors feature automated mechanisms maintaining coher...
Moving data between processes has often been discussed as one of the major bottlenecks in parallel c...
We compare the performance of three major programming models— a load-store cache-coherent shared add...