A New Bandit Setting Balancing Information from State Evolution and Corrupted Context

Galozy, Alexander
Nowaczyk, Slawomir
Ohlsson, Mattias

Publication date

November 2023

Language

English

Abstract

We propose a new sequential decision-making setting, combining key aspects of two established online learning problems with bandit feedback. The optimal action to play at any given moment is contingent on an underlying changing state which is not directly observable by the agent. Each state is associated with a context distribution, possibly corrupted, allowing the agent to identify the state. Furthermore, states evolve in a Markovian fashion, providing useful information to estimate the current state via state history. In the proposed problem setting, we tackle the challenge of deciding on which of the two sources of information the agent should base its arm selection. We present an algorithm that uses a referee to dynamically combine the ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

A New Bandit Setting Balancing Information from State Evolution and Corrupted Context

Abstract

Extracted data

A New Bandit Setting Balancing Information from State Evolution and Corrupted Context

Abstract

Extracted data

Related items

Related items