Bandit algorithms such as Thompson Sampling (TS) have been put forth for decades as useful for conducting adaptively-randomized experiments. By skewing the allocation ratio towards superior arms, they can substantially improve participants’ welfare with respect to particular outcomes of interest. For example, as we illustrate in this work, they may use participants’ ratings for understanding and assigning promising text messages for managing mental health issues more often. However, model-based algorithms such as TS, typically assume binary or normal models, which may lead to suboptimal performances in categorical rating scale outcomes. Guided by our field experiment, we extend the application of TS to rating scale data and show its improve...
Conducting randomized experiments in education settings raises the question of how we can use machin...
Designing experiments often requires balancing between learning about the true treatment effects and...
In this work, we address the combinatorial optimization problem in the stochastic bandit setting wit...
Behavioral scientists are increasingly able to conduct randomized experiments in settings that enabl...
Multi-armed bandit algorithms like Thompson Sampling (TS) can be used to conduct adaptive experiment...
The aim of the research presented in this dissertation is to construct a model for personalised item...
We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-arme...
Item does not contain fulltextIn this article, we consider a sequential sampling scheme for efficien...
We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts ...
This work presents an extension of Thompson Sampling bandit policy for orchestrating the collection ...
We consider an experimental setting in which a matching of resources to participants has to be chose...
In field experiments, researchers commonly allocate subjects to different treatment conditions befor...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...
Thompson Sampling (TS) has surged a lot of interest due to its good empirical performance, in partic...
Conducting randomized experiments in education settings raises the question of how we can use machin...
Designing experiments often requires balancing between learning about the true treatment effects and...
In this work, we address the combinatorial optimization problem in the stochastic bandit setting wit...
Behavioral scientists are increasingly able to conduct randomized experiments in settings that enabl...
Multi-armed bandit algorithms like Thompson Sampling (TS) can be used to conduct adaptive experiment...
The aim of the research presented in this dissertation is to construct a model for personalised item...
We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-arme...
Item does not contain fulltextIn this article, we consider a sequential sampling scheme for efficien...
We explore a new model of bandit experiments where a potentially nonstationary sequence of contexts ...
This work presents an extension of Thompson Sampling bandit policy for orchestrating the collection ...
We consider an experimental setting in which a matching of resources to participants has to be chose...
In field experiments, researchers commonly allocate subjects to different treatment conditions befor...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...
Thompson Sampling (TS) has surged a lot of interest due to its good empirical performance, in partic...
Conducting randomized experiments in education settings raises the question of how we can use machin...
Designing experiments often requires balancing between learning about the true treatment effects and...
In this work, we address the combinatorial optimization problem in the stochastic bandit setting wit...