Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed and parallel systems is a difficult task. These large distributed and parallel systems are composed of various complex software and hardware components. When the system experiences some performance or correctness problem, developers struggle to understand the root cause of the problem and fix in a timely manner. In my thesis, I address these three components of the performance problems in computer systems. First, we focus on diagnosing performance problems in large-scale parallel applications running on supercomputers. We developed techniques to localize the performance problem for root-cause analysis. Parallel applications, most of which are c...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Diagnosing performance problems in modern datacenters and distributed systems is challenging, as the...
Robust distributed systems commonly employ high-level recov-ery mechanisms enabling the system to re...
Large production systems are susceptible to chronic performance problems where the system still work...
As today\u27s distributed applications increase in complexity, it becomes increasingly difficult to ...
Scienti c parallel programs often undergo signicant performance tuning before meeting their performa...
This paper discusses a methodology for diagnosing performance problems for parallel and distributed ...
Developing correct and efficient software for large scale systems is a challenging task. Developers ...
peer-reviewedThe shift towards multicore processing has led to a much wider population of developer...
Failures in computing systems are unavoidable. Therefore, it is important to detect and diagnose fai...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
International audienceTo efficiently exploit the resources of new many-core architectures, integrati...
Performance problems commonly exist in many kinds of real-world applications, including smartphone a...
Abstract. Writing multithreaded software for multicore computers con-fronts many developers with the...
Parallel performance tuning naturally involves a diagnosis process to locate and explain sources of ...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Diagnosing performance problems in modern datacenters and distributed systems is challenging, as the...
Robust distributed systems commonly employ high-level recov-ery mechanisms enabling the system to re...
Large production systems are susceptible to chronic performance problems where the system still work...
As today\u27s distributed applications increase in complexity, it becomes increasingly difficult to ...
Scienti c parallel programs often undergo signicant performance tuning before meeting their performa...
This paper discusses a methodology for diagnosing performance problems for parallel and distributed ...
Developing correct and efficient software for large scale systems is a challenging task. Developers ...
peer-reviewedThe shift towards multicore processing has led to a much wider population of developer...
Failures in computing systems are unavoidable. Therefore, it is important to detect and diagnose fai...
The amount of parallelism in modern supercomputers currently grows from generation to generation. Fu...
International audienceTo efficiently exploit the resources of new many-core architectures, integrati...
Performance problems commonly exist in many kinds of real-world applications, including smartphone a...
Abstract. Writing multithreaded software for multicore computers con-fronts many developers with the...
Parallel performance tuning naturally involves a diagnosis process to locate and explain sources of ...
The amount of parallelism in modern supercomputers currently grows from generation to generation, an...
Diagnosing performance problems in modern datacenters and distributed systems is challenging, as the...
Robust distributed systems commonly employ high-level recov-ery mechanisms enabling the system to re...