We propose COMDETECTIVE+, an inter-thread communication analyzer, and REUSETRACKER+, a reuse distance analyzer, that leverage the hardware features in AMD processors to support low-overhead profiling. Both tools employ the instruction-based sampling (IBS) facility and debug registers in AMD processors to detect inter-thread communication and data reuse. Different from prior arts, COMDETECTIVE+ differentiates the communication into true and false sharing, and REUSETRACKER+ measures reuse distance in private and shared caches by also considering cache line invalidation with low overhead. Both tools can attribute the communications and reuses to source code lines. To our knowledge these tools are two of the few profiling tools designed specifi...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...
In a multicore environment, inter-thread communication can provide valuable insights about applicat...
We propose COMDETECTIVE+, an inter-thread communication analyzer, and REUSETRACKER+, a reuse distanc...
As multicore processors implementing shared-memory programming models have become commonplace, analy...
The performance and energy efficiency of multicore systems are increasingly dominated by the costs o...
This paper presents and validates methods to extend reuse distance analysis of application locality ...
Cache is one of the most widely used components in today's computing systems. Its performance is hea...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
Feedback-directed optimization has become an increasingly important tool in designing and building o...
Feedback-directed optimization has become an increasingly impor-tant tool in designing and building ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarc...
Abstract. This paper proposes a methodology to study the data reuse quality of task-parallel runtime...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...
In a multicore environment, inter-thread communication can provide valuable insights about applicat...
We propose COMDETECTIVE+, an inter-thread communication analyzer, and REUSETRACKER+, a reuse distanc...
As multicore processors implementing shared-memory programming models have become commonplace, analy...
The performance and energy efficiency of multicore systems are increasingly dominated by the costs o...
This paper presents and validates methods to extend reuse distance analysis of application locality ...
Cache is one of the most widely used components in today's computing systems. Its performance is hea...
As computing efficiency becomes constrained by hardware scaling limitations, code optimization grows...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
Feedback-directed optimization has become an increasingly important tool in designing and building o...
Feedback-directed optimization has become an increasingly impor-tant tool in designing and building ...
Profiling can effectively analyze program behavior and provide critical information for feedback-dir...
Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarc...
Abstract. This paper proposes a methodology to study the data reuse quality of task-parallel runtime...
The gap between processor speed and memory latency has led to the use of caches in the memory system...
Locality increasingly determines system performance. As a rigor-ous and precise locality model, reus...
In a multicore environment, inter-thread communication can provide valuable insights about applicat...