<p>This dissertation describes sequential decision making problems in non-stationary environments. Online learning algorithms deal with non-stationary environments, but generally there is no notion of a dynamic state to model future impacts of past actions. State-based models are common in stochastic control settings, but well-known frameworks such as Markov decision processes (MDPs) assume a known stationary environment. In recent years, there has been a growing interest in fusing the above two important learning frameworks and considering an MDP setting in which the cost function is allowed to change arbitrarily over time. A number of online MDP algorithms have been designed to work under various assumptions about the dynamics of state tr...
Time-average Markov decision problems are considered for the finite state and action spaces. Several...
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focus...
Abstract—This paper considers an online (real-time) control problem that involves an agent performin...
In this paper we consider online learning in fi-nite Markov decision processes (MDPs) with changing ...
Abstract. We consider the learning problem under an online Markov decision process (MDP), which is a...
We consider the learning problem under an online Markov decision process (MDP), which is aimed at le...
Abstract. Computing optimal or approximate policies for partially observable Markov decision process...
We study the problem of online learning Markov Decision Processes (MDPs) when both the transition di...
This paper considers an online (real-time) control problem that involves an agent performing a discr...
A short tutorial introduction is given to Markov decision processes (MDP), including the latest acti...
AbstractWe consider an upper confidence bound algorithm for learning in Markov decision processes wi...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
International audienceWe consider online learning in finite stochastic Markovian environments where ...
International audienceWe study the problem of online learning in finite episodic Markov decision pro...
Abstract — We consider an online (real-time) control problem that involves an agent performing a dis...
Time-average Markov decision problems are considered for the finite state and action spaces. Several...
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focus...
Abstract—This paper considers an online (real-time) control problem that involves an agent performin...
In this paper we consider online learning in fi-nite Markov decision processes (MDPs) with changing ...
Abstract. We consider the learning problem under an online Markov decision process (MDP), which is a...
We consider the learning problem under an online Markov decision process (MDP), which is aimed at le...
Abstract. Computing optimal or approximate policies for partially observable Markov decision process...
We study the problem of online learning Markov Decision Processes (MDPs) when both the transition di...
This paper considers an online (real-time) control problem that involves an agent performing a discr...
A short tutorial introduction is given to Markov decision processes (MDP), including the latest acti...
AbstractWe consider an upper confidence bound algorithm for learning in Markov decision processes wi...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
International audienceWe consider online learning in finite stochastic Markovian environments where ...
International audienceWe study the problem of online learning in finite episodic Markov decision pro...
Abstract — We consider an online (real-time) control problem that involves an agent performing a dis...
Time-average Markov decision problems are considered for the finite state and action spaces. Several...
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focus...
Abstract—This paper considers an online (real-time) control problem that involves an agent performin...