Developing correct and efficient software for large scale systems is a challenging task. Developers may overlook pathological cases in large scale runs, employ inefficient algorithms that do not scale, or conduct premature performance optimizations that work only for small scale runs. Such program errors and inefficiencies can result in an especially subtle class of bugs that are scale-dependent. While small-scale test cases may not exhibit these bugs, large-scale production runs may suffer failures or performance issues caused by them. Without an effective method to find such bugs, the developers are forced to search through an enormous amount of logs generated in production systems to fix a scaling problem. We developed a series of statis...
With the growing use of computers in almost every aspect of our lives, software failures have greate...
Multicore and Internet cloud systems have been widely adopted in recent years and have resulted in t...
Performance bugs, i.e., program source code that is unnecessarily inefficient, have received signifi...
MPI is the de-facto standard message-passing based parallel programming model. However, the bug dete...
Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in ...
In this document, we present our approaches for understanding and discovering scalability faults,i.e...
As today\u27s distributed applications increase in complexity, it becomes increasingly difficult to ...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
Abstract—Statistical debugging identifies program behaviors that are highly correlated with failures...
There are few runtime tools for modestly sized computing systems, with 10^3 processors, and above th...
Today's largest systems have over 100,000 cores, with million-core systems expected over the next fe...
We propose a new fault localization technique for software bugs in large-scale computing systems. Ou...
This thesis is about scalable analysis and testing techniques for asynchronous programs. Due to thei...
Robust distributed systems commonly employ high-level recov-ery mechanisms enabling the system to re...
The Scalable Analysis Toolkit (SAT) project aimed to demonstrate that it is feasible and useful to s...
With the growing use of computers in almost every aspect of our lives, software failures have greate...
Multicore and Internet cloud systems have been widely adopted in recent years and have resulted in t...
Performance bugs, i.e., program source code that is unnecessarily inefficient, have received signifi...
MPI is the de-facto standard message-passing based parallel programming model. However, the bug dete...
Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in ...
In this document, we present our approaches for understanding and discovering scalability faults,i.e...
As today\u27s distributed applications increase in complexity, it becomes increasingly difficult to ...
Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed an...
Abstract—Statistical debugging identifies program behaviors that are highly correlated with failures...
There are few runtime tools for modestly sized computing systems, with 10^3 processors, and above th...
Today's largest systems have over 100,000 cores, with million-core systems expected over the next fe...
We propose a new fault localization technique for software bugs in large-scale computing systems. Ou...
This thesis is about scalable analysis and testing techniques for asynchronous programs. Due to thei...
Robust distributed systems commonly employ high-level recov-ery mechanisms enabling the system to re...
The Scalable Analysis Toolkit (SAT) project aimed to demonstrate that it is feasible and useful to s...
With the growing use of computers in almost every aspect of our lives, software failures have greate...
Multicore and Internet cloud systems have been widely adopted in recent years and have resulted in t...
Performance bugs, i.e., program source code that is unnecessarily inefficient, have received signifi...