Abstract—As modern processors are becoming increasingly complex, fast and accurate performance prediction is crucial dur-ing the early phases of hardware and software co-development. To accurately and efficiently predict the performance of a given software workload is, however, a challenging problem. Traditional cycle-accurate simulation is often too slow, while analytical models are not sufficiently accurate or still require target-specific execution statistics that may be slow or difficult to obtain. In this paper, we propose a novel learning-based approach for synthesizing analytical models that can accurately predict the performance of a workload on a target platform from various performance statistics obtained directly on a host platfo...
Accurate workload prediction and throughput estimation are keys in efficient proactive power and per...
The ongoing trend of increasing computer hardware and software complexity has resulted in the increa...
Standard benchmarking provides the run times for given programs on given machines, but fails to prov...
Under growing complexity and time-to-market pressures of modern computer systems, agile co-developme...
The cycle-accurate simulation is a method for design space study of a processor system before it goe...
CPUs and dedicated accelerators (namely GPUs and FPGAs) continue to grow increasingly large and comp...
Abstract—The microarchitectural design space of a new processor is too large for an architect to eva...
Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly...
Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel app...
Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel app...
Developing an optimizing compiler for a newly proposed architecture is extremely difficult when ther...
Abstract. Multicore architectures featuring specialized accelerators are getting an increasing amoun...
Scientific applications often require massive amounts of compute time and power. With the constantly...
Accurately modeling and predicting performance for large-scale applications becomes increasingly dif...
Design space exploration of a processor system, prior to its hardware implementation, usually involv...
Accurate workload prediction and throughput estimation are keys in efficient proactive power and per...
The ongoing trend of increasing computer hardware and software complexity has resulted in the increa...
Standard benchmarking provides the run times for given programs on given machines, but fails to prov...
Under growing complexity and time-to-market pressures of modern computer systems, agile co-developme...
The cycle-accurate simulation is a method for design space study of a processor system before it goe...
CPUs and dedicated accelerators (namely GPUs and FPGAs) continue to grow increasingly large and comp...
Abstract—The microarchitectural design space of a new processor is too large for an architect to eva...
Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly...
Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel app...
Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel app...
Developing an optimizing compiler for a newly proposed architecture is extremely difficult when ther...
Abstract. Multicore architectures featuring specialized accelerators are getting an increasing amoun...
Scientific applications often require massive amounts of compute time and power. With the constantly...
Accurately modeling and predicting performance for large-scale applications becomes increasingly dif...
Design space exploration of a processor system, prior to its hardware implementation, usually involv...
Accurate workload prediction and throughput estimation are keys in efficient proactive power and per...
The ongoing trend of increasing computer hardware and software complexity has resulted in the increa...
Standard benchmarking provides the run times for given programs on given machines, but fails to prov...