Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem

Yahyaa, Saba
Drugan, Madalina M.
Manderick, Bernard

Publication date

January 2014

Publisher

i6doc.com publication

Abstract

The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem

Abstract

Extracted data

Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem

Abstract

Extracted data

Related items

Related items