As planned, in the beginning of the project I’ve been concentrating on the topic of online aggregation for undiscounted reinforcement learning in Markov decision processes (MDPs). I’ve started research on online aggregation already back in Austria, so that I could quickly conclude work by proving regret bounds for a modified UCRL2 algorithm [2], which employs confidence intervals for calculating an aggregation of the estimated model MDP before computing an optimistic policy. More precisely, given an unknown MDP the proposed algorithm UCAgg maintains confidence intervals for rewards and transition probabilities in order to define a set of plausible MDPs just like UCRL2. However, before selecting an optimistic plausible model and a respective...
Past approaches for using reinforcement learning to derive dialog control policies have assumed that...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
The reinforcement learning community has recently intensified its interest in online plan-ning metho...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
We describe how to use robust Markov decision processes for value function ap-proximation with state...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
International audienceWe consider the problem of online reinforcement learning when several state re...
We derive sublinear regret bounds for undiscounted reinforcement learning in con-tinuous state space...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
In control systems theory, the Markov decision process (MDP) is a widely used optimization model inv...
arXiv admin note: text overlap with arXiv:2205.07704We consider reinforcement learning in an environ...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
Past approaches for using reinforcement learning to derive dialog control policies have assumed that...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
The reinforcement learning community has recently intensified its interest in online plan-ning metho...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
We describe how to use robust Markov decision processes for value function ap-proximation with state...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
International audienceWe consider the problem of online reinforcement learning when several state re...
We derive sublinear regret bounds for undiscounted reinforcement learning in con-tinuous state space...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
In control systems theory, the Markov decision process (MDP) is a widely used optimization model inv...
arXiv admin note: text overlap with arXiv:2205.07704We consider reinforcement learning in an environ...
International audienceOptimistic algorithms have been extensively studied for regret minimization in...
Past approaches for using reinforcement learning to derive dialog control policies have assumed that...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
The reinforcement learning community has recently intensified its interest in online plan-ning metho...