Bandits-Manchots Combinatoires : du retour utilisateur à la recommandation

Letard, A
Amghar, T
Camp, O
Gutowski, N

Publication date

June 2021

Publisher

HAL CCSD

Abstract

International audienceRecently, the COMbinatorial Multi-Armed Bandits (COMMAB) problem has arisen as an active research ﬁeld. In systems interacting with humans, those reinforcement learning approaches use a feedback strategy as their reward function. On the study of those strategies, this paper present three contributions : 1) We model a feedback strategy as a three-step process, namely : Feedback Identiﬁcation, Feedback Retrieval and Reward Computing, where each step inﬂuences the performances of an agent. 2) Based on this model, we propose a novel Reward Computing process, BUSBC, which signiﬁcantly increases the global accuracyreachedbyoptimisticCOM-MABalgorithms;3) We conduct an empirical analysis on our approach and several feedback st...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Bandits-Manchots Combinatoires : du retour utilisateur à la recommandation

Abstract

Extracted data

Bandits-Manchots Combinatoires : du retour utilisateur à la recommandation

Abstract

Extracted data

Related items

Related items