Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-node pair. Using historical data is often not feasible as logs are typically not retained indefinitely and workloads as well as infrastructure changes. In contrast, online methods, which predict task runtimes on specific nodes while the workflow is running, have to cope with the lack of example runs, especially during the start-up. In this paper, we present Lotaru, a novel online method for locally estimating task runtimes in scientific workflows on heterogeneous clusters. Lotaru first prof...
Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC bec...
International audienceIn many data-intensive applications, workflow is often used as an important mo...
Many functions in today’s society are immensely dependent on data. Data drives everything from busin...
Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to c...
Scientific workflows typically comprise a multitude of different processing steps which often are ex...
Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling...
Many fields of modern science require huge amounts of computation, and workflows are a very popular ...
Scientific workflow management systems like Nextflow support large-scale data analysis by abstractin...
With the increasing amount of data available to scientists in disciplines as diverse as bioinformat...
The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amo...
In this paper, we present a scheduling scheme to estimate the turnaround time of parallel...
International audienceWorkflow is an important model for big data processing and resource provisioni...
The Genomic Data Commons (GDC) is a data platform for managing, processing, analyzing, and sharing c...
International audienceThis article tackles the problem of scheduling multiuser scientific workflows ...
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) ...
Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC bec...
International audienceIn many data-intensive applications, workflow is often used as an important mo...
Many functions in today’s society are immensely dependent on data. Data drives everything from busin...
Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to c...
Scientific workflows typically comprise a multitude of different processing steps which often are ex...
Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling...
Many fields of modern science require huge amounts of computation, and workflows are a very popular ...
Scientific workflow management systems like Nextflow support large-scale data analysis by abstractin...
With the increasing amount of data available to scientists in disciplines as diverse as bioinformat...
The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amo...
In this paper, we present a scheduling scheme to estimate the turnaround time of parallel...
International audienceWorkflow is an important model for big data processing and resource provisioni...
The Genomic Data Commons (GDC) is a data platform for managing, processing, analyzing, and sharing c...
International audienceThis article tackles the problem of scheduling multiuser scientific workflows ...
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) ...
Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC bec...
International audienceIn many data-intensive applications, workflow is often used as an important mo...
Many functions in today’s society are immensely dependent on data. Data drives everything from busin...