Deployment of a distributed deep learning technology stack on a large parallel system is a very complex process, involving the integration and configuration of several layers of both, general-purpose and custom software. The details of such kind of deployments are rarely described in the literature. This paper presents the experiences observed during the deployment of a technology stack to enable deep learning workloads on MareNostrum, a petascale supercomputer. The components of a layered architecture, based on the usage of Apache Spark, are described and the performance and scalability of the resulting system is evaluated. This is followed by a discussion about the impact of different configurations including parallelism, storage and netw...
Deep learning becomes a hot topic recently in various areas, from industry to academia. More and m...
Continuously increasing data volumes from multiple sources, such as simulation and experimental meas...
Machine learning has established itself as a powerful tool for the construction of decision making m...
Deployment of a distributed deep learning technology stack on a large parallel system is a very comp...
Neural networks are becoming more and more popular in scientific field and in the industry. It is mo...
The rapid growth of data and ever increasing model complexity of deep neural networks (DNNs) have en...
peer reviewedWith renewed global interest for Artificial Intelligence (AI) methods, the past decade ...
Deep learning has been postulated as a solution for numerous problems in different branches of scien...
The effective utilization at scale of complex machine learning (ML) techniques for HEP use cases pos...
In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a peta...
As the models and the datasets to train deep learning (DL) models scale, system architects are faced...
For certain problems, training deep artificial neural networks can require far more compute resource...
Deep Neural Networks (DNNs) enable computers to excel across many different applications such as ima...
Abstract—In this paper we present a framework to enable data-intensive Spark workloads on MareNostru...
Deep neural networks are trained by solving huge optimization problems with large datasets and milli...
Deep learning becomes a hot topic recently in various areas, from industry to academia. More and m...
Continuously increasing data volumes from multiple sources, such as simulation and experimental meas...
Machine learning has established itself as a powerful tool for the construction of decision making m...
Deployment of a distributed deep learning technology stack on a large parallel system is a very comp...
Neural networks are becoming more and more popular in scientific field and in the industry. It is mo...
The rapid growth of data and ever increasing model complexity of deep neural networks (DNNs) have en...
peer reviewedWith renewed global interest for Artificial Intelligence (AI) methods, the past decade ...
Deep learning has been postulated as a solution for numerous problems in different branches of scien...
The effective utilization at scale of complex machine learning (ML) techniques for HEP use cases pos...
In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, a peta...
As the models and the datasets to train deep learning (DL) models scale, system architects are faced...
For certain problems, training deep artificial neural networks can require far more compute resource...
Deep Neural Networks (DNNs) enable computers to excel across many different applications such as ima...
Abstract—In this paper we present a framework to enable data-intensive Spark workloads on MareNostru...
Deep neural networks are trained by solving huge optimization problems with large datasets and milli...
Deep learning becomes a hot topic recently in various areas, from industry to academia. More and m...
Continuously increasing data volumes from multiple sources, such as simulation and experimental meas...
Machine learning has established itself as a powerful tool for the construction of decision making m...