Scientific workflows are often used to automate large-scale data analysis pipelines on clusters, grids, and clouds. However, because workflows can be extremely data-intensive, and are often executed on shared resources, it is critical to be able to limit or minimize the amount of disk space that workflows use on shared storage systems. This paper proposes a novel and simple approach that constrains the amount of storage space used by a workflow by inserting data cleanup tasks into the workflow task graph. Unlike previous solutions, the proposed approach provides guaranteed limits on disk usage, requires no new functionality in the underlying workflow scheduler, and does not require estimates of task runtimes. Experimental results show that ...
Scientific workflows in High Performance Computing ( HPC ...
Resource abundance is apparent in today's multicore era. Workflow applications common in science and...
Many applications in science and engineering become increasingly complex and large scale. These appl...
In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific wo...
Abstract—Among scheduling algorithms of scientific work-flows, the graph partitioning is a technique...
In this paper we examine the issue of optimizing disk usage and of scheduling large-scale scientific...
With the increasing amount of data available to scientists in disciplines as diverse as bioinformat...
2014-09-15Scientific workflows are a means of defining and orchestrating large, complex, multi-stage...
In high-performance computing (HPC), workflow-based workloads are usually data intensive for explora...
Data-intensive workflows stage large amounts of data in and out of compute resources. The data stagi...
This thesis is motivated by the fact that there is an urgent need to run scientific many-task workfl...
Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling...
This paper presents the Maximum Effective Reduction (MER) algorithm, which optimizes the resource ef...
Scientific workflows have become the primary mechanism for conducting analyses on distributed comput...
AbstractThis paper presents the Maximum Effective Reduction (MER) algorithm, which optimizes the res...
Scientific workflows in High Performance Computing ( HPC ...
Resource abundance is apparent in today's multicore era. Workflow applications common in science and...
Many applications in science and engineering become increasingly complex and large scale. These appl...
In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific wo...
Abstract—Among scheduling algorithms of scientific work-flows, the graph partitioning is a technique...
In this paper we examine the issue of optimizing disk usage and of scheduling large-scale scientific...
With the increasing amount of data available to scientists in disciplines as diverse as bioinformat...
2014-09-15Scientific workflows are a means of defining and orchestrating large, complex, multi-stage...
In high-performance computing (HPC), workflow-based workloads are usually data intensive for explora...
Data-intensive workflows stage large amounts of data in and out of compute resources. The data stagi...
This thesis is motivated by the fact that there is an urgent need to run scientific many-task workfl...
Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling...
This paper presents the Maximum Effective Reduction (MER) algorithm, which optimizes the resource ef...
Scientific workflows have become the primary mechanism for conducting analyses on distributed comput...
AbstractThis paper presents the Maximum Effective Reduction (MER) algorithm, which optimizes the res...
Scientific workflows in High Performance Computing ( HPC ...
Resource abundance is apparent in today's multicore era. Workflow applications common in science and...
Many applications in science and engineering become increasingly complex and large scale. These appl...