We study the problem of estimating the largest gain of an unknown linear and time-invariant filter, which is also known as the H∞ norm of the system. By using ideas from the stochastic multi-armed bandit framework, we present a new algorithm that sequentially designs an input signal in order to estimate this quantity by means of input-output data. The algorithm is shown empirically to beat an asymptotically optimal method, known as Thompson Sampling, in the sense of its cumulative regret function. Finally, for a general class of algorithms, a lower bound on the performance of finding the H-infinity norm is derived.QC 20180306</p
International audienceThis paper considers the problem of maximizing an expectation function over a ...
In this paper we consider the problem of actively learning the mean values of distributions associat...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
We study the problem of estimating the largest gain of an unknown linear and time-invariant filter, ...
A novel approach to the gain estimation problem,using a multi-armed bandit formulation, is studied. ...
We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. This is...
We consider a bandit problem which involves sequential sampling from two populations (arms). Each ar...
In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-...
This thesis investigates a new method to estimate the system norm using reinforcement learning. Give...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
We consider a linear stochastic bandit prob-lem where the dimension K of the unknown parameter is l...
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p∗ for the...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
International audienceThis paper considers the problem of maximizing an expectation function over a ...
In this paper we consider the problem of actively learning the mean values of distributions associat...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
We study the problem of estimating the largest gain of an unknown linear and time-invariant filter, ...
A novel approach to the gain estimation problem,using a multi-armed bandit formulation, is studied. ...
We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. This is...
We consider a bandit problem which involves sequential sampling from two populations (arms). Each ar...
In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-...
This thesis investigates a new method to estimate the system norm using reinforcement learning. Give...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
We consider a linear stochastic bandit prob-lem where the dimension K of the unknown parameter is l...
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p∗ for the...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
International audienceThis paper considers the problem of maximizing an expectation function over a ...
In this paper we consider the problem of actively learning the mean values of distributions associat...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...