Bandit learning has been an increasingly popular design choice for recommender system. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. One major bottleneck is how to test the effectiveness of bandit algorithm with fairness and without data leakage. Different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior may induce unfair evaluation in a classic A/B test setting. In this work, we apply upper confidence bound (UCB) to our large scale short video recommender system and present a test framework fo...
In this work, we explore an online reinforcement learning problem called the multi-armed bandit for ...
We study the task of maximizing rewards from recommending items (actions) to users sequentially inte...
Consider online learning algorithms that simultaneously make decisions and learn from feedback. Such...
Bandit learning has been an increasingly popular design choice for recommender system. Despite the s...
For several web tasks such as ad placement or e-commerce, recommender systems must recommend multip...
Many recent recommendation systems leverage the large quantity of reviews placed by users on items. ...
This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multicl...
We study the problem of batch learning from bandit feedback in the setting of extremely large action...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
Many recent recommendation systems leverage the large quantity of reviews placed by users on items. ...
High-quality recommender systems ought to deliver both innovative and relevant content through effec...
We study recommendation in scenarios where there's no prior information about the quality of content...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
In sequential decision problems in an unknown environment, the decision maker often faces a dilemma ...
In this work, we explore an online reinforcement learning problem called the multi-armed bandit for ...
We study the task of maximizing rewards from recommending items (actions) to users sequentially inte...
Consider online learning algorithms that simultaneously make decisions and learn from feedback. Such...
Bandit learning has been an increasingly popular design choice for recommender system. Despite the s...
For several web tasks such as ad placement or e-commerce, recommender systems must recommend multip...
Many recent recommendation systems leverage the large quantity of reviews placed by users on items. ...
This paper introduces the Banditron, a variant of the Perceptron [Rosenblatt, 1958], for the multicl...
We study the problem of batch learning from bandit feedback in the setting of extremely large action...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
Many recent recommendation systems leverage the large quantity of reviews placed by users on items. ...
High-quality recommender systems ought to deliver both innovative and relevant content through effec...
We study recommendation in scenarios where there's no prior information about the quality of content...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
In sequential decision problems in an unknown environment, the decision maker often faces a dilemma ...
In this work, we explore an online reinforcement learning problem called the multi-armed bandit for ...
We study the task of maximizing rewards from recommending items (actions) to users sequentially inte...
Consider online learning algorithms that simultaneously make decisions and learn from feedback. Such...