ABSTRACT: This article proposes a novel approach to synchronize a posteriori the detailed execution traces from several networked computers. It can be used to debug and investigate complex performance problems in systems where several computers exchange information. When the distributed system is under study, detailed execution traces are generated locally on each system using an efficient and accurate system level tracer, LTTng. When the tracing is finished, the individual traces are collected and analysed together. The messaging events in all the traces are then identified and correlated in order to estimate the time offset over time between each node. The time offset computation imprecision, associated with asymmetric network delays and ...
One of the most challenging problems facing today's software engineer is to understand and modify di...
ABSTRACT: Debugging and profiling tools can alter the execution flow or timing, can induce heisenbug...
This paper is the second part of a two part paper that documents a detailed survey of the research c...
Supercomputing is a key technological pillar of modern science and engineering, indispensable for so...
ABSTRACT: We propose a new class of profiler for distributed and heterogeneous systems. In these sys...
ABSTRACT: Most computers have several high-resolution timing sources, from the programmable interrup...
Stragglers, which are tasks that operate significantly slower than other tasks in a system, are a bi...
ABSTRACT: Tracing allows the analysis of task interactions with each other and with the operating sy...
This paper will examine aspects related to network synchronization distribution and the cascading of...
Techniques are described herein for maintaining real-time software clocks in a distributed network. ...
It has been shown by Freris, Graham and Kumar that clocks in distributed networks cannot be synchron...
This dissertation investigates the question: How do we precisely access and control time in a networ...
A distributed system consists of a set of processors that communicate by message transmission and th...
Developers and users of high-performance distributed systems often observe performance problems such...
Abstract—Event traces are valuable for understanding the behavior of parallel programs. However, aut...
One of the most challenging problems facing today's software engineer is to understand and modify di...
ABSTRACT: Debugging and profiling tools can alter the execution flow or timing, can induce heisenbug...
This paper is the second part of a two part paper that documents a detailed survey of the research c...
Supercomputing is a key technological pillar of modern science and engineering, indispensable for so...
ABSTRACT: We propose a new class of profiler for distributed and heterogeneous systems. In these sys...
ABSTRACT: Most computers have several high-resolution timing sources, from the programmable interrup...
Stragglers, which are tasks that operate significantly slower than other tasks in a system, are a bi...
ABSTRACT: Tracing allows the analysis of task interactions with each other and with the operating sy...
This paper will examine aspects related to network synchronization distribution and the cascading of...
Techniques are described herein for maintaining real-time software clocks in a distributed network. ...
It has been shown by Freris, Graham and Kumar that clocks in distributed networks cannot be synchron...
This dissertation investigates the question: How do we precisely access and control time in a networ...
A distributed system consists of a set of processors that communicate by message transmission and th...
Developers and users of high-performance distributed systems often observe performance problems such...
Abstract—Event traces are valuable for understanding the behavior of parallel programs. However, aut...
One of the most challenging problems facing today's software engineer is to understand and modify di...
ABSTRACT: Debugging and profiling tools can alter the execution flow or timing, can induce heisenbug...
This paper is the second part of a two part paper that documents a detailed survey of the research c...