This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multi-armed bandit problems. The problems involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a novel algorithm, approximate information maximization (AIM), which employs an analytical approximation of the entropy gradient to choose which arm to pull at each point in time. AIM matches the performance of Infomax and Thompson sampling while also offering enhanced computational speed, determinism, and tractability. Empirical evaluation of AIM indicates its compliance with the Lai-Robbins asymptotic bound and demonstrates its robustness for a r...
peer reviewedThe exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science...
We propose information-directed sampling – a new algorithm for online optimization prob-lems in whic...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
Entropy maximization and free energy minimization are general physical principles for modeling the d...
Multi-armed bandit, a popular framework for sequential decision-making problems, has recently gained...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete mul...
Sequential decision making problems often require an agent to act in an environment where data is no...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
While in general trading off exploration and exploitation in reinforcement learning is hard, under s...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
International audienceWe propose a Bayesian information-geometric approach to the exploration-exploi...
Published version of a chapter from the book: Modern Approaches in Applied Intelligence. Also availa...
Stochastic multi--armed bandits solve the Exploration--Exploitation dilemma and ultimately maximize ...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
peer reviewedThe exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science...
We propose information-directed sampling – a new algorithm for online optimization prob-lems in whic...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
Entropy maximization and free energy minimization are general physical principles for modeling the d...
Multi-armed bandit, a popular framework for sequential decision-making problems, has recently gained...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete mul...
Sequential decision making problems often require an agent to act in an environment where data is no...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
While in general trading off exploration and exploitation in reinforcement learning is hard, under s...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
International audienceWe propose a Bayesian information-geometric approach to the exploration-exploi...
Published version of a chapter from the book: Modern Approaches in Applied Intelligence. Also availa...
Stochastic multi--armed bandits solve the Exploration--Exploitation dilemma and ultimately maximize ...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
peer reviewedThe exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science...
We propose information-directed sampling – a new algorithm for online optimization prob-lems in whic...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...