We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown. In each period, the learning agent observes a d-dimensional contextual information about the user and the N available items, and offers an assortment of size K to the user, and observes the bandit feedback of the item chosen from the assortment. We propose upper confidence bound based algorithms for this MNL contextual bandit. The first algorithm is a simple and practical method that achieves an O(d√T) regret over T rounds. Next, we propose a second algorithm which achieves a O(√dT) regret. This matches the lower bound for the MNL bandit problem, up to logarithmic terms, and improves on...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Often, individuals must choose among discrete alternatives with imperfect information about their va...
The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making p...
We study the assortment optimization problem under the Sequential Multinomial Logit (SML), a discret...
The classical Multinomial Logit (MNL) is a behavioral model for user choice. In this model, a user i...
Learning the optimal ordering of content is an important challenge in website design. The learning t...
In this paper, we study the dynamic assortment optimization problem under a finite selling season of...
One fundamental problem in revenue management that arises in many settings including retail and disp...
The classical Multinomial Logit (MNL) is a behavioral model for user choice. In this model, a user i...
Learning the optimal ordering of content is an important challenge in website design. The learning t...
We study the problem of model selection for contextual bandits, in which the algorithm must balance ...
We consider dynamic multi-product pricing and assortment problems under an unknown demand over T per...
Assortment planning is an important problem that arises in many industries such as retailing and air...
We study a ranking problem in the contextual multi-armed bandit setting. A learning agent selects an...
We study the product assortment problem of a retail operation that faces a stream of customers who a...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Often, individuals must choose among discrete alternatives with imperfect information about their va...
The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making p...
We study the assortment optimization problem under the Sequential Multinomial Logit (SML), a discret...
The classical Multinomial Logit (MNL) is a behavioral model for user choice. In this model, a user i...
Learning the optimal ordering of content is an important challenge in website design. The learning t...
In this paper, we study the dynamic assortment optimization problem under a finite selling season of...
One fundamental problem in revenue management that arises in many settings including retail and disp...
The classical Multinomial Logit (MNL) is a behavioral model for user choice. In this model, a user i...
Learning the optimal ordering of content is an important challenge in website design. The learning t...
We study the problem of model selection for contextual bandits, in which the algorithm must balance ...
We consider dynamic multi-product pricing and assortment problems under an unknown demand over T per...
Assortment planning is an important problem that arises in many industries such as retailing and air...
We study a ranking problem in the contextual multi-armed bandit setting. A learning agent selects an...
We study the product assortment problem of a retail operation that faces a stream of customers who a...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Often, individuals must choose among discrete alternatives with imperfect information about their va...
The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making p...