In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file system in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slowdown, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all compute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLA...
The effective management of enormous data volumes on the Cloud platform has attracted devoting resea...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...
In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communic...
Abstract—Data-intensive scientific workflows are composed of many tasks that exhibit data precedence...
Abstract—Data-intensive scientific workflows exhibit inter-task dependencies that generate file-base...
Compute clusters, consisting of many, uniformly built nodes, are used to run a large spectrum of dif...
Many scientific computations can be expressed as Many-Task Computing (MTC) applications. In such sce...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Scientific domains such as astronomy or bioinformatics produce increasingly large amounts of data th...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide sub...
The adoption of low latency persistent memory modules (PMMs) upends the long-established model of re...
National audienceIn this report we address the problem of data management in clouds for the MapReduc...
Distributed storage systems running on clusters of commodity hardware are challenged by the ever-gro...
The effective management of enormous data volumes on the Cloud platform has attracted devoting resea...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...
In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communic...
Abstract—Data-intensive scientific workflows are composed of many tasks that exhibit data precedence...
Abstract—Data-intensive scientific workflows exhibit inter-task dependencies that generate file-base...
Compute clusters, consisting of many, uniformly built nodes, are used to run a large spectrum of dif...
Many scientific computations can be expressed as Many-Task Computing (MTC) applications. In such sce...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Scientific domains such as astronomy or bioinformatics produce increasingly large amounts of data th...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide sub...
The adoption of low latency persistent memory modules (PMMs) upends the long-established model of re...
National audienceIn this report we address the problem of data management in clouds for the MapReduc...
Distributed storage systems running on clusters of commodity hardware are challenged by the ever-gro...
The effective management of enormous data volumes on the Cloud platform has attracted devoting resea...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...