Relative debugging helps trace software errors by comparing two concurrent executions of a program - one code being a reference version and the other faulty. By locating data divergence between the runs, relative debugging is effective at finding coding errors when a program is scaled up to solve larger problem sizes or migrated from one platform to another. In this work, we envision potential changes to our current relative debugging scheme in order to address exascale factors such as the increase of faults and the nondeterministic outputs. First, we propose a statistical-based comparison scheme to support verifying results that are stochastic. Second, we leverage a scalable data reduction network to adapt to the complex network hierarchy ...
Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in ...
ARGE scientific codes are constantly evolving. Refine-ments in understanding physical phenomena resu...
Relative debugging traces software errors by comparing two executions of a program concurrently - on...
Detecting and isolating bugs that arise only at high processor counts is a challenging task. Over a ...
Abstract—Statistical debugging identifies program behaviors that are highly correlated with failures...
Because large scientific codes are rarely static objects, developers are often faced with the tediou...
This paper discusses the use of "relative debugging" as a technique for locating errors in...
Traditional debuggers are of limited value for modern scientific codes that manipulate large complex...
Relative debugging is a system which allows a programmer to compare the state of two executing progr...
Relative Debugging is a paradigm that assists users to locate errors in programs that have been corr...
Traditional debuggers are of limited value for modern scientific codes that manipulate large complex...
Because large scientific codes are rarely static objects, developers are often faced with the tediou...
Runtime verification of large-scale scientific codes is difficult because they often involve thousan...
AbstractTraditional debuggers are of limited value for modern scientific codes that manipulate large...
AbstractRuntime verification of large-scale scientific codes is difficult because they often involve...
Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in ...
ARGE scientific codes are constantly evolving. Refine-ments in understanding physical phenomena resu...
Relative debugging traces software errors by comparing two executions of a program concurrently - on...
Detecting and isolating bugs that arise only at high processor counts is a challenging task. Over a ...
Abstract—Statistical debugging identifies program behaviors that are highly correlated with failures...
Because large scientific codes are rarely static objects, developers are often faced with the tediou...
This paper discusses the use of "relative debugging" as a technique for locating errors in...
Traditional debuggers are of limited value for modern scientific codes that manipulate large complex...
Relative debugging is a system which allows a programmer to compare the state of two executing progr...
Relative Debugging is a paradigm that assists users to locate errors in programs that have been corr...
Traditional debuggers are of limited value for modern scientific codes that manipulate large complex...
Because large scientific codes are rarely static objects, developers are often faced with the tediou...
Runtime verification of large-scale scientific codes is difficult because they often involve thousan...
AbstractTraditional debuggers are of limited value for modern scientific codes that manipulate large...
AbstractRuntime verification of large-scale scientific codes is difficult because they often involve...
Petascale platforms with O(10{sup 5}) and O(10{sup 6}) processing cores are driving advancements in ...
ARGE scientific codes are constantly evolving. Refine-ments in understanding physical phenomena resu...
Relative debugging traces software errors by comparing two executions of a program concurrently - on...