Mechanistic processor performance modeling builds an analytical model from understanding the underlying mechanisms in the processor and provides fundamental insight in program-microarchitecture interactions, as well as microarchitecture structure scaling trends and interactions. Whereas prior work in mechanistic performance modeling focused on superscalar out-of-order processors, this paper presents a mechanistic performance model for superscalar in-order processors. We find mechanistic modeling for in-order processors to be more challenging compared to out-of-order processors because the latter are designed to hide latencies, and hence from a modeling perspective, detailed modeling of instruction execution latencies and dependencies is not...
DoctorProcessor microarchitectures have been evolving and getting sophisticated to meet increasing c...
Design space exploration (DSE) is a key ingredient of system-level design, enabling designers to qui...
Cycles per Instruction (CPI) stacks break down processor execution time into a baseline CPI plus a n...
Mechanistic processor performance modeling builds an analytical model from understanding the underly...
Superscalar in-order processors form an interesting alternative to out-of-order processors because o...
Superscalar in-order processors form an interesting alternative to out-of-order processors because o...
A mechanistic model for out-of-order superscalar processors is developed and then applied to the stu...
A proposed performance model for superscalar processors consists of 1) a component that models the r...
Optimizing processors for (a) specific application(s) can substantially improve energy-efficiency. W...
Fast and accurate processor simulation is essential in processor design.\ud Trace-driven simulation ...
This paper proposes an analytical model to predict Memory-Level Parallelism (MLP) in a superscalar p...
Understanding the performance impact of compiler optimizations on superscalar processors is complica...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
Optimizing processors for specific application(s) can substantially improve energy-efficiency. With ...
A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' w...
DoctorProcessor microarchitectures have been evolving and getting sophisticated to meet increasing c...
Design space exploration (DSE) is a key ingredient of system-level design, enabling designers to qui...
Cycles per Instruction (CPI) stacks break down processor execution time into a baseline CPI plus a n...
Mechanistic processor performance modeling builds an analytical model from understanding the underly...
Superscalar in-order processors form an interesting alternative to out-of-order processors because o...
Superscalar in-order processors form an interesting alternative to out-of-order processors because o...
A mechanistic model for out-of-order superscalar processors is developed and then applied to the stu...
A proposed performance model for superscalar processors consists of 1) a component that models the r...
Optimizing processors for (a) specific application(s) can substantially improve energy-efficiency. W...
Fast and accurate processor simulation is essential in processor design.\ud Trace-driven simulation ...
This paper proposes an analytical model to predict Memory-Level Parallelism (MLP) in a superscalar p...
Understanding the performance impact of compiler optimizations on superscalar processors is complica...
The main aim of this short paper is to investigate multiple-instruction-issue in a high-performance ...
Optimizing processors for specific application(s) can substantially improve energy-efficiency. With ...
A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' w...
DoctorProcessor microarchitectures have been evolving and getting sophisticated to meet increasing c...
Design space exploration (DSE) is a key ingredient of system-level design, enabling designers to qui...
Cycles per Instruction (CPI) stacks break down processor execution time into a baseline CPI plus a n...