Cycle-harvesting systems such as Condor have been developed to make desktop machines in a local area (which are often similar to clusters in hardware configuration) available as a compute platform. To provide a dual-use capability, opportunistic jobs harvesting cycles from the desktop must be checkpointed before the desktop resources are reclaimed by their owners and the job is evacuated. In this paper, we investigate a new system for computing efficient checkpoint schedules in cycleharvesting environments. Our system records the historical availability from each resource and fits a statistical model to the observations. Because checkpointing must often traverse the network (i.e. the desktop hosts do not provide sufficient persistent storag...
Performance evaluation of checkpoint rollback recovery strategies for distributed systems is a field...
As computational clusters rapidly grow in both size and complexity, system reliability and, in parti...
International audienceWe consider a context where the available resources of the Intranet of a compa...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amountof idle computational res...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational re...
In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycl...
AbstractOrganisations such as research institutions and universities often increase utilisation of t...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational re...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
This paper examines the performance of synchronous checkpointing in a distributed computing environm...
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures th...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
Forshaw M, McGough AS, Thomas N. (2014) Energy-efficient checkpointing in high-throughput cycle-stea...
Checkpointing has been widely adopted in support of fault-tolerance and job migration essential for ...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
Performance evaluation of checkpoint rollback recovery strategies for distributed systems is a field...
As computational clusters rapidly grow in both size and complexity, system reliability and, in parti...
International audienceWe consider a context where the available resources of the Intranet of a compa...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amountof idle computational res...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational re...
In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycl...
AbstractOrganisations such as research institutions and universities often increase utilisation of t...
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational re...
Abstract: Checkpointing is a procedure of storing process state to a file, which is later used to re...
This paper examines the performance of synchronous checkpointing in a distributed computing environm...
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures th...
By leveraging the enormous amount of computational capabilities, scientists today are being able to ...
Forshaw M, McGough AS, Thomas N. (2014) Energy-efficient checkpointing in high-throughput cycle-stea...
Checkpointing has been widely adopted in support of fault-tolerance and job migration essential for ...
textTo make progress in the face of failures, long-running parallel applications need to save their ...
Performance evaluation of checkpoint rollback recovery strategies for distributed systems is a field...
As computational clusters rapidly grow in both size and complexity, system reliability and, in parti...
International audienceWe consider a context where the available resources of the Intranet of a compa...