Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe- art open-source big data processing tools. Our t...
International audienceBig Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been...
International audienceExecuting Big Data workloads upon High Performance Computing (HPC) infrastract...
Large scale computer clusters have during the last years become dominant for making computations in ...
Abstract—Large science projects rely on complex workflows to analyze terabytes or petabytes of data....
Scientific data generated at experimental and observational facilities are increasingly being proces...
Modern parallel systems and applications are constantly increasing in scale and complexity, and cons...
Current large-scale HPC systems consist of complex configurations with a huge number of potentially ...
Big data processing has recently gained a lot of attention both from academia and industry. The term...
Big Data analysis is of great challenges in practice. The data set sizes will grow quickly. And anal...
Performance analysis tools allow application developers to identify and characterize the inefficienc...
HPC systems and parallel applications are increasing their complexity. Therefore the possibility of ...
High performance computing (HPC) and Big Data are technologies vital for advancement in science, bus...
In recent years big data has emerged as a universal term and its management has become a crucial res...
With larger and larger systems being constantly deployed, trace-based performance analysis of paral...
In this paper, we present the design and analyze the performance-energy characteristics of a softwar...
International audienceBig Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been...
International audienceExecuting Big Data workloads upon High Performance Computing (HPC) infrastract...
Large scale computer clusters have during the last years become dominant for making computations in ...
Abstract—Large science projects rely on complex workflows to analyze terabytes or petabytes of data....
Scientific data generated at experimental and observational facilities are increasingly being proces...
Modern parallel systems and applications are constantly increasing in scale and complexity, and cons...
Current large-scale HPC systems consist of complex configurations with a huge number of potentially ...
Big data processing has recently gained a lot of attention both from academia and industry. The term...
Big Data analysis is of great challenges in practice. The data set sizes will grow quickly. And anal...
Performance analysis tools allow application developers to identify and characterize the inefficienc...
HPC systems and parallel applications are increasing their complexity. Therefore the possibility of ...
High performance computing (HPC) and Big Data are technologies vital for advancement in science, bus...
In recent years big data has emerged as a universal term and its management has become a crucial res...
With larger and larger systems being constantly deployed, trace-based performance analysis of paral...
In this paper, we present the design and analyze the performance-energy characteristics of a softwar...
International audienceBig Data analytics frameworks (e.g., Apache Hadoop and Apache Spark) have been...
International audienceExecuting Big Data workloads upon High Performance Computing (HPC) infrastract...
Large scale computer clusters have during the last years become dominant for making computations in ...