Sequential decision making is a core component of many real-world applications, from computer-network operations to online ads. The major tool for this use is reinforcement learning: an agent takes a sequence of decisions in order to achieve its goal, with typically noisy measurements of the evolution of the environment. For instance, a self-driving car can be controlled by such an agent; the environment is the city in which the car manœuvers. Bandit problems are a class of reinforcement learning for which very strong theoretical properties can be shown. The focus of bandit algorithms is on the exploration-exploitation dilemma: in order to have good performance, the agent must have a deep knowledge of its environment (exploration); however,...
Designing an adaptive sequence of exercises in Intelligent Tutoring Systems (ITS) requiresto charact...
International audienceWe study the problem of combinatorial pure exploration in the stochastic multi...
The main topics adressed in this thesis lie in the general domain of sequential learning, and in par...
Sequential decision making is a core component of many real-world applications, from computer-networ...
Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploi...
Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploi...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
A Multi-Armed Bandits (MAB) is a learning problem where an agent sequentially chooses an action amon...
This thesis concerns « model-based » methods to solve reinforcement learning problems : an agent int...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
International audienceNowadays, in most fields of activities, companies are strengthening their digi...
This thesis is dedicated to the study of resource allocation problems in uncertain environments, whe...
We investigate the structural properties of certain sequential decision-making problems with limited...
Bandit problems constitute a sequential dynamic allocation problem. The pulling agent has to explore...
The multi-armed bandit is a framework allowing the study of the trade-off between exploration and ex...
Designing an adaptive sequence of exercises in Intelligent Tutoring Systems (ITS) requiresto charact...
International audienceWe study the problem of combinatorial pure exploration in the stochastic multi...
The main topics adressed in this thesis lie in the general domain of sequential learning, and in par...
Sequential decision making is a core component of many real-world applications, from computer-networ...
Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploi...
Combinatorial stochastic semi-bandits appear naturally in many contexts where the exploration/exploi...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
A Multi-Armed Bandits (MAB) is a learning problem where an agent sequentially chooses an action amon...
This thesis concerns « model-based » methods to solve reinforcement learning problems : an agent int...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
International audienceNowadays, in most fields of activities, companies are strengthening their digi...
This thesis is dedicated to the study of resource allocation problems in uncertain environments, whe...
We investigate the structural properties of certain sequential decision-making problems with limited...
Bandit problems constitute a sequential dynamic allocation problem. The pulling agent has to explore...
The multi-armed bandit is a framework allowing the study of the trade-off between exploration and ex...
Designing an adaptive sequence of exercises in Intelligent Tutoring Systems (ITS) requiresto charact...
International audienceWe study the problem of combinatorial pure exploration in the stochastic multi...
The main topics adressed in this thesis lie in the general domain of sequential learning, and in par...