This paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory applications. ESTIMA is designed to perform a simple, yet important task: given the performance of an application on a small machine with a handful of cores, ESTIMA extrapolates its scalability to a larger machine with more cores, while requiring minimum input from the user. The key idea underlying ESTIMA is the use of stalled cycles (e.g. cycles that the processor spends waiting for various events, such as cache misses or waiting on a lock). ESTIMA measures stalled cycles on a few cores and extrapolates them to more cores, estimating the amount of waiting in the system. ESTIMA can be effectively used to predict the scalability of in-memory applic...
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due...
Supercomputers are used to solve some of the world’s most computationally demanding problems. Exasc...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
HPC applications usually run at a low fraction of the computer's peak performance. Empirical perform...
Predicting the scalability of parallel applications is becoming crucial now that the number of cores...
Conducting a thorough performance evaluation of an STM is very time consuming. Depressingly, even wi...
A proliferation of frameworks have emerged to handle the challenges of making distributed computatio...
Using Machine Learning to yield Scalable Program Analyses Program Analysis tackles the problem of p...
(Under the direction of Assistant Professor Dr. Frank Mueller). Over recent decades, computing speed...
Programmers are driven to parallelize their programs because of both hardware limitations and the ne...
URL to paper from conference siteThis paper analyzes the scalability of seven system applications (...
Many applied scientific domains are increasingly relying on large-scale parallel computation. Conseq...
The current trend of increasingly larger Web-based applications makes scalability the key challenge ...
Performance engineering is a fundamental task in high-performance computing (HPC). By definition, HP...
Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Moder...
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due...
Supercomputers are used to solve some of the world’s most computationally demanding problems. Exasc...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...
HPC applications usually run at a low fraction of the computer's peak performance. Empirical perform...
Predicting the scalability of parallel applications is becoming crucial now that the number of cores...
Conducting a thorough performance evaluation of an STM is very time consuming. Depressingly, even wi...
A proliferation of frameworks have emerged to handle the challenges of making distributed computatio...
Using Machine Learning to yield Scalable Program Analyses Program Analysis tackles the problem of p...
(Under the direction of Assistant Professor Dr. Frank Mueller). Over recent decades, computing speed...
Programmers are driven to parallelize their programs because of both hardware limitations and the ne...
URL to paper from conference siteThis paper analyzes the scalability of seven system applications (...
Many applied scientific domains are increasingly relying on large-scale parallel computation. Conseq...
The current trend of increasingly larger Web-based applications makes scalability the key challenge ...
Performance engineering is a fundamental task in high-performance computing (HPC). By definition, HP...
Concurrency levels in large-scale, distributed-memory supercomputers are rising exponentially. Moder...
Ensuring the continuous scaling of parallel applications is challenging on many-core processors, due...
Supercomputers are used to solve some of the world’s most computationally demanding problems. Exasc...
Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim i...