In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds u...
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We fo...
Kernel-based bandit is an extensively studied black-box optimization problem, in which the objective...
We consider stochastic sequential learning problems where the learner can observe the \textit{averag...
In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear ban...
We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\th...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits,...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds signifi...
Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making s...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret min...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
We study two model selection settings in stochastic linear bandits (LB). In the first setting, which...
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We fo...
Kernel-based bandit is an extensively studied black-box optimization problem, in which the objective...
We consider stochastic sequential learning problems where the learner can observe the \textit{averag...
In this paper, we revisit the regret minimization problem in sparse stochastic contextual linear ban...
We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\th...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits,...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds signifi...
Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making s...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
We study the Pareto frontier of two archetypal objectives in multi-armed bandits, namely, regret min...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
We study two model selection settings in stochastic linear bandits (LB). In the first setting, which...
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We fo...
Kernel-based bandit is an extensively studied black-box optimization problem, in which the objective...
We consider stochastic sequential learning problems where the learner can observe the \textit{averag...