Thompson sampling in the adaptive linear scalarized multi objective multi armed bandit

Yahyaa, Saba Q
Drugan, MM Madalina
Manderick, Bernard

Publication date

January 2015

Publisher

SCITEPRESS-Science and Technology Publications, Lda.

Abstract

\u3cp\u3eIn the stochastic multi-objective multi-armed bandit (MOMAB), arms generate a vector of stochastic normal rewards, one per objective, instead of a single scalar reward. As a result, there is not only one optimal arm, but there is a set of optimal arms (Pareto front) using Pareto dominance relation. The goal of an agent is to find the Pareto front. To find the optimal arms, the agent can use linear scalarization function that transforms a multi-objective problem into a single problem by summing the weighted objectives. Selecting the weights is crucial, since different weights will result in selecting a different optimum arm from the Pareto front. Usually, a predefined weights set is used and this can be computational inefficient whe...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Thompson sampling in the adaptive linear scalarized multi objective multi armed bandit

Abstract

Extracted data

Thompson sampling in the adaptive linear scalarized multi objective multi armed bandit

Abstract

Extracted data

Related items

Related items