International audienceThere is a large space of NUMA and hardware prefetcher configurations that can significantly impact the performance of an application. Previous studies have demonstrated how a model can automatically select configurations based on the dynamic properties of the code to achieve speedups. This paper demonstrates how the static Intermediate Representation (IR) of the code can guide NUMA/prefetcher optimizations without the prohibitive cost of performance profiling. We propose a method to create a comprehensive dataset that includes a diverse set of intermediate representations along with optimum configurations. We then apply a graph neural network model in order to validate this dataset. We show that our static intermediat...
Program synthesis is a term that describes a family of techniques that enables automatic generation ...
Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a drama...
Cavazos, JohnIt has been shown that machine-learning driven optimizations often outperform bundled o...
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HP...
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HP...
HPC systems expose configuration options that help users optimize their applications' execution. Que...
Abstract—Modern processors are equipped with multiple hardware prefetchers, each of which targets a ...
he Von Neumann bottleneck is a persistent problem in computer architecture, causing stalls and waste...
Performance bottlenecks across distributed nodes, such as in high performance computing grids or clo...
[EN] Current multi-core processors implement sophisticated hardware prefetchers, that can be configu...
In recent years, machine learning (ML) and, more noticeably, deep learning (DL), have be- come incre...
Modern computer systems have evolved to employ powerful parallel architectures, including multi-core...
The end of Moore's law is driving the search for new techniques to improve system performance as app...
As the number of cores increases Non-Uniform Memory Access (NUMA) is becoming increasingly prevalent...
tures are ubiquitous in HPC systems. NUMA along with other factors including socket layout, data pla...
Program synthesis is a term that describes a family of techniques that enables automatic generation ...
Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a drama...
Cavazos, JohnIt has been shown that machine-learning driven optimizations often outperform bundled o...
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HP...
Both NUMA thread/data placement and hardware prefetcher configuration have significant impacts on HP...
HPC systems expose configuration options that help users optimize their applications' execution. Que...
Abstract—Modern processors are equipped with multiple hardware prefetchers, each of which targets a ...
he Von Neumann bottleneck is a persistent problem in computer architecture, causing stalls and waste...
Performance bottlenecks across distributed nodes, such as in high performance computing grids or clo...
[EN] Current multi-core processors implement sophisticated hardware prefetchers, that can be configu...
In recent years, machine learning (ML) and, more noticeably, deep learning (DL), have be- come incre...
Modern computer systems have evolved to employ powerful parallel architectures, including multi-core...
The end of Moore's law is driving the search for new techniques to improve system performance as app...
As the number of cores increases Non-Uniform Memory Access (NUMA) is becoming increasingly prevalent...
tures are ubiquitous in HPC systems. NUMA along with other factors including socket layout, data pla...
Program synthesis is a term that describes a family of techniques that enables automatic generation ...
Rapid progress in deep learning is leading to a diverse set of quickly changing models, with a drama...
Cavazos, JohnIt has been shown that machine-learning driven optimizations often outperform bundled o...