This dissertation introduces a new metric in the area of High Performance Computing (HPC) application reliability and performance modeling. Derived via the time-dependent implementation of an existing inequality measure, the Failure index (FI) generates a coefficient representing the level of volatility for the failures incurred by an application running on a given HPC system in a given time interval. This coefficient presents a normalized cross-system representation of the failure volatility of applications running on failure-rich HPC platforms. Further, the origin and ramifications of application failures are investigated, from which certain mathematical conclusions yield greater insight into the behavior of these applications in failure-...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Designing highly dependable systems requires a good understanding of failure characteristics. Unfort...
With petascale computers only a year or two away there is a pressing need to anticipate and compensa...
The demand for more computational power to solve complex scientific problems has been driving the ph...
With the enormous number of computing resources in HPC and Cloud systems, failures become a major co...
2018 Summer.Includes bibliographical references.High performance computing (HPC) systems, such as da...
Supercomputers have played an essential role in the progress of science and engineering research. As...
HPC systems are widely used in industrial, economical, and scientific applications, and many of thes...
Following the growth of high performance computing systems (HPC) in size and complexity, and the adv...
A increasingly larger percentage of computing capacity in today's large high-performance computing s...
AbstractThe growing complexity and size of High Performance Computing systems (HPCs) lead to frequen...
As the scale of High-Performance Computing (HPC) clusters continues to grow, their increasing failur...
As the scale of High-performance Computing (HPC) systems continues to grow, researchers are devoted ...
As supercomputers become larger and more powerful, they are growing increasingly complex. This is re...
High Performance Computing (HPC) brings with it the promise of deeper insight into complex phenomen...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Designing highly dependable systems requires a good understanding of failure characteristics. Unfort...
With petascale computers only a year or two away there is a pressing need to anticipate and compensa...
The demand for more computational power to solve complex scientific problems has been driving the ph...
With the enormous number of computing resources in HPC and Cloud systems, failures become a major co...
2018 Summer.Includes bibliographical references.High performance computing (HPC) systems, such as da...
Supercomputers have played an essential role in the progress of science and engineering research. As...
HPC systems are widely used in industrial, economical, and scientific applications, and many of thes...
Following the growth of high performance computing systems (HPC) in size and complexity, and the adv...
A increasingly larger percentage of computing capacity in today's large high-performance computing s...
AbstractThe growing complexity and size of High Performance Computing systems (HPCs) lead to frequen...
As the scale of High-Performance Computing (HPC) clusters continues to grow, their increasing failur...
As the scale of High-performance Computing (HPC) systems continues to grow, researchers are devoted ...
As supercomputers become larger and more powerful, they are growing increasingly complex. This is re...
High Performance Computing (HPC) brings with it the promise of deeper insight into complex phenomen...
As high-performance computing (HPC) continues to progress, constraints on HPC system design forces t...
Designing highly dependable systems requires a good understanding of failure characteristics. Unfort...
With petascale computers only a year or two away there is a pressing need to anticipate and compensa...