The processor industry is at an inflection point. In the past, performance was the driving force behind the processor industry. But in the coming many-core era, improving programmability and reliability of the system will be at least as important as improving raw performance. To meet this vision, this thesis presents a processor feature that assists programmers in understanding software failures. Reproducing software failures is a significant challenge. The problem is severe especially for multi-threaded programs because the causes of failure can be non-deterministic in nature. The proposed processor feature continuously logs a program's execution while sacrificing very little performance (̃1\ %). If the program crashes, the developer can u...
The ability to reproduce a parallel execution is desirable for debugging and program reliability pur...
It is a great challenge to build reliable computer systems with unreliable hardware and buggy softwa...
Ability to replay a program’s execution on a multi-processor system can significantly help parallel ...
Significant time is spent by companies trying to reproduce and fix bugs. We recently proposed a har...
Significant time is spent by companies trying to reproduce and fix bugs. BugNet is a recent architec...
Reproducing a failure is the first and most important step in debugging because it enables us to und...
Debugging software is challenging because of the increasing complexity of software and hardware, and...
Constant reduction in the size of transistors has made it possible to implement many cores on a sing...
Debugging large-scale, data-intensive, distributed applications running in a datacenter ("datacenter...
Hardware vendors are currently transitioning from single-threaded microprocessors to chips that inte...
Debugging a faulty program can be very hard and time-consuming. The programmer usually reexecutes hi...
Record and deterministic Replay (RnR) is a primitive with many proposed applications in computer sys...
Alongside the rise of multi-processor machines, concurrent programming models have grown to near ubi...
Debugging concurrent programs is known to be difficult due to scheduling non-determinism. The techni...
While a lot of work has been focused on design and programming of shared memory multi-core architect...
The ability to reproduce a parallel execution is desirable for debugging and program reliability pur...
It is a great challenge to build reliable computer systems with unreliable hardware and buggy softwa...
Ability to replay a program’s execution on a multi-processor system can significantly help parallel ...
Significant time is spent by companies trying to reproduce and fix bugs. We recently proposed a har...
Significant time is spent by companies trying to reproduce and fix bugs. BugNet is a recent architec...
Reproducing a failure is the first and most important step in debugging because it enables us to und...
Debugging software is challenging because of the increasing complexity of software and hardware, and...
Constant reduction in the size of transistors has made it possible to implement many cores on a sing...
Debugging large-scale, data-intensive, distributed applications running in a datacenter ("datacenter...
Hardware vendors are currently transitioning from single-threaded microprocessors to chips that inte...
Debugging a faulty program can be very hard and time-consuming. The programmer usually reexecutes hi...
Record and deterministic Replay (RnR) is a primitive with many proposed applications in computer sys...
Alongside the rise of multi-processor machines, concurrent programming models have grown to near ubi...
Debugging concurrent programs is known to be difficult due to scheduling non-determinism. The techni...
While a lot of work has been focused on design and programming of shared memory multi-core architect...
The ability to reproduce a parallel execution is desirable for debugging and program reliability pur...
It is a great challenge to build reliable computer systems with unreliable hardware and buggy softwa...
Ability to replay a program’s execution on a multi-processor system can significantly help parallel ...