In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file sys-tem in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slow-down, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all com-pute nodes, based on a distributed hash function. Our cluster experiments with Montage and ...
Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide sub...
Distributed storage systems running on clusters of commodity hardware are challenged by the ever-gro...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...
In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communic...
Abstract—Data-intensive scientific workflows are composed of many tasks that exhibit data precedence...
Abstract—Data-intensive scientific workflows exhibit inter-task dependencies that generate file-base...
Compute clusters, consisting of many, uniformly built nodes, are used to run a large spectrum of dif...
Many scientific computations can be expressed as Many-Task Computing (MTC) applications. In such sce...
The adoption of low latency persistent memory modules (PMMs) upends the long-established model of re...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Scientific domains such as astronomy or bioinformatics produce increasingly large amounts of data th...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
The effective management of enormous data volumes on the Cloud platform has attracted devoting resea...
National audienceIn this report we address the problem of data management in clouds for the MapReduc...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide sub...
Distributed storage systems running on clusters of commodity hardware are challenged by the ever-gro...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...
In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communic...
Abstract—Data-intensive scientific workflows are composed of many tasks that exhibit data precedence...
Abstract—Data-intensive scientific workflows exhibit inter-task dependencies that generate file-base...
Compute clusters, consisting of many, uniformly built nodes, are used to run a large spectrum of dif...
Many scientific computations can be expressed as Many-Task Computing (MTC) applications. In such sce...
The adoption of low latency persistent memory modules (PMMs) upends the long-established model of re...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Scientific domains such as astronomy or bioinformatics produce increasingly large amounts of data th...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
The effective management of enormous data volumes on the Cloud platform has attracted devoting resea...
National audienceIn this report we address the problem of data management in clouds for the MapReduc...
The ever-increasing power of supercomputer systems is both driving and enabling the emergence of new...
Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide sub...
Distributed storage systems running on clusters of commodity hardware are challenged by the ever-gro...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...