Supporting uninterrupted services for distributed soft real-time applications is hard in resource-constrained and dynamic environments, where processor or process failures and system workload changes are common. Fault-tolerant middleware for these applications must achieve high service availability and satisfactory response times for client appli-cations. Although passive replication is a promising fault tolerance strategy for resource-constrained systems, con-ventional client failover approaches are non-adaptive and load-agnostic, which can cause system overloads and sig-nificantly increase response times after failure recovery. This paper presents four contributions to the study of passive replication for distributed soft real-time applic...
Clusters of message-passing computing nodes provide high-performance platforms for distributed appli...
Exascale systems of the future are predicted to have mean time between node failures (MTBF) of less ...
We present a new software architecture in which all concepts necessary to achieve fault tolerance ca...
An important class of distributed real-time and embedded (DRE) applications consists predominantly o...
A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, ...
Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than ...
Today’s software engineering and application development trend is to take advantage of reusable soft...
Abstract. Generic middleware can often not provide satisfactory solu-tions, but neither is it accept...
Networked computer systems are prevalent in most aspects of modern society, and we have become depen...
It is imperative to accept that failures can and will occur even in meticulously designed distribute...
A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, ...
As future systems scale up to extreme, their propensity to failure increases significantly, making i...
Abstract—With the advent of multi- and many-core architec-tures, new opportunities in fault-tolerant...
Many real-time applications will have strict reliability requirements in addition to the timing requ...
Keeping strongly consistent the state of the replicas of a software service deployed across a distri...
Clusters of message-passing computing nodes provide high-performance platforms for distributed appli...
Exascale systems of the future are predicted to have mean time between node failures (MTBF) of less ...
We present a new software architecture in which all concepts necessary to achieve fault tolerance ca...
An important class of distributed real-time and embedded (DRE) applications consists predominantly o...
A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, ...
Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than ...
Today’s software engineering and application development trend is to take advantage of reusable soft...
Abstract. Generic middleware can often not provide satisfactory solu-tions, but neither is it accept...
Networked computer systems are prevalent in most aspects of modern society, and we have become depen...
It is imperative to accept that failures can and will occur even in meticulously designed distribute...
A monitoring approach to the problem of constructing fault-tolerant and adaptive real-time systems, ...
As future systems scale up to extreme, their propensity to failure increases significantly, making i...
Abstract—With the advent of multi- and many-core architec-tures, new opportunities in fault-tolerant...
Many real-time applications will have strict reliability requirements in addition to the timing requ...
Keeping strongly consistent the state of the replicas of a software service deployed across a distri...
Clusters of message-passing computing nodes provide high-performance platforms for distributed appli...
Exascale systems of the future are predicted to have mean time between node failures (MTBF) of less ...
We present a new software architecture in which all concepts necessary to achieve fault tolerance ca...