Continuous-time Markov decision processes are an important class of models in a wide range of applications, ranging from cyber-physical systems to synthetic biology. A central problem is how to devise a policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we present a novel approach based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. The statistical approach has several advantages over conventional approaches based on uniformisation, as it can also be applied when the model is replaced by a black box, and does not suffer from state-space explosion. The use of a stochastic gradient to guide our search...
We study continuous-time stochastic games with time-bounded reachability objectives. We show that ea...
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of...
International audienceWe consider the problem of finding a near-optimal policy using value-function ...
5siContinuous-time Markov decision processes provide a very powerful mathematical framework to solve...
AbstractA continuous-time Markov decision process (CTMDP) is a generalization of a continuous-time M...
A continuous-time Markov decision process (CTMDP) is a generalization of a continuous-time Markov ch...
International audiencePolicy search is a method for approximately solving an optimal control problem...
Abstract. Continuous time Markov Chains (CTMCs) are a convenient mathematical model for a broad rang...
We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs...
Continuous time Markov Chains (CTMCs) are a convenient mathematical model for a broad range of natur...
We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs...
Partially observable Markov decision processes are interesting because of their ability to model mos...
We study the time-bounded reachability problem for continuous time Markov decision processes (CTMDPs...
Continuous-time Markov decision processes (CTMDPs) are widely used for the control of queueing syste...
Continuous-time Markov decision processes (CTMDPs) are widely used for the control of queueing syste...
We study continuous-time stochastic games with time-bounded reachability objectives. We show that ea...
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of...
International audienceWe consider the problem of finding a near-optimal policy using value-function ...
5siContinuous-time Markov decision processes provide a very powerful mathematical framework to solve...
AbstractA continuous-time Markov decision process (CTMDP) is a generalization of a continuous-time M...
A continuous-time Markov decision process (CTMDP) is a generalization of a continuous-time Markov ch...
International audiencePolicy search is a method for approximately solving an optimal control problem...
Abstract. Continuous time Markov Chains (CTMCs) are a convenient mathematical model for a broad rang...
We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs...
Continuous time Markov Chains (CTMCs) are a convenient mathematical model for a broad range of natur...
We study the time-bounded reachability problem for continuous-time Markov decision processes (CTMDPs...
Partially observable Markov decision processes are interesting because of their ability to model mos...
We study the time-bounded reachability problem for continuous time Markov decision processes (CTMDPs...
Continuous-time Markov decision processes (CTMDPs) are widely used for the control of queueing syste...
Continuous-time Markov decision processes (CTMDPs) are widely used for the control of queueing syste...
We study continuous-time stochastic games with time-bounded reachability objectives. We show that ea...
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of...
International audienceWe consider the problem of finding a near-optimal policy using value-function ...