Inspired by advertising markets, we consider large-scale sequential decision making problems in which a learner must deploy an algorithm to behave optimally under uncertainty. Although many of these problems can be modeled as contextual bandit problems, we argue that the tools and techniques for analyzing bandit problems with large numbers of actions and contexts can be greatly expanded. While convexity and metric-similarity assumptions on the process generating rewards have yielded some algorithms in existing literature, certain types of assumptions that have been fruitful in offline supervised learning settings have yet to even be considered. Notably missing, for example, is any kind of graphical model approach to assuming structured rewa...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
The greedy algorithm is extensively studied in the field of combinatorial optimiza-tion for decades....
In the online linear optimization problem, a learner must choose, in each round, a decision from a s...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We study the problem of decision-making under uncertainty in the bandit setting. This thesis goes be...
International audienceWe consider online bandit learning in which at every time step, an algorithm h...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
A lot of software systems today need to make real-time decisions to optimize an objective of interes...
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Resear...
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is ...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
The greedy algorithm is extensively studied in the field of combinatorial optimiza-tion for decades....
In the online linear optimization problem, a learner must choose, in each round, a decision from a s...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We study the problem of decision-making under uncertainty in the bandit setting. This thesis goes be...
International audienceWe consider online bandit learning in which at every time step, an algorithm h...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
A lot of software systems today need to make real-time decisions to optimize an objective of interes...
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Resear...
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is ...
We develop a learning principle and an efficient algorithm for batch learning from logged bandit fee...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
The greedy algorithm is extensively studied in the field of combinatorial optimiza-tion for decades....
In the online linear optimization problem, a learner must choose, in each round, a decision from a s...