Abstract. We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We define the reward-in-hindsight as the highest reward the decision maker could have achieved, while satisfying the constraints, had she known Nature’s choices in advance. We show that in general the reward-in-hindsight is not attainable. The convex hull of the reward-in-hindsight function is, however, attainable. For the important case of a single constraint the convex hull turns out to be the highest attainable function. We further provide an explicit strategy that attains this convex hull using a calibrated forecasting rule.
No-regret algorithms for online convex optimization are potent online learning tools and have been d...
We consider the problem of sequential decision making under uncertainty in which the loss caused by ...
Deep exploration requires coordinated long-term planning. We present a model-based reinforcement le...
We study online learning where a decision maker interacts with Nature with the objective of maximiz...
© 2017 Neural information processing systems foundation. All rights reserved. We study a variant of ...
In this paper, we study a sequential decision making problem. The objective is to maximize the avera...
Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In...
International audienceWe consider the problem of online optimization, where a learner chooses a deci...
International audienceWe study a class of online convex optimization problems with long-term budget ...
34 pages, 15 figuresSpurred by the enthusiasm surrounding the "Big Data" paradigm, the mathematical ...
The framework of online learning with memory naturally captures learning problems with temporal effe...
We address online linear optimization problems when the possible actions of the decision maker are r...
In this research we study some online learning algorithms in the online convex optimization framewor...
This paper proposes a novel algorithm for solving discrete online learning prob-lems under stochasti...
International audienceWe consider the problem of online learning with non-convex losses. In terms of...
No-regret algorithms for online convex optimization are potent online learning tools and have been d...
We consider the problem of sequential decision making under uncertainty in which the loss caused by ...
Deep exploration requires coordinated long-term planning. We present a model-based reinforcement le...
We study online learning where a decision maker interacts with Nature with the objective of maximiz...
© 2017 Neural information processing systems foundation. All rights reserved. We study a variant of ...
In this paper, we study a sequential decision making problem. The objective is to maximize the avera...
Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In...
International audienceWe consider the problem of online optimization, where a learner chooses a deci...
International audienceWe study a class of online convex optimization problems with long-term budget ...
34 pages, 15 figuresSpurred by the enthusiasm surrounding the "Big Data" paradigm, the mathematical ...
The framework of online learning with memory naturally captures learning problems with temporal effe...
We address online linear optimization problems when the possible actions of the decision maker are r...
In this research we study some online learning algorithms in the online convex optimization framewor...
This paper proposes a novel algorithm for solving discrete online learning prob-lems under stochasti...
International audienceWe consider the problem of online learning with non-convex losses. In terms of...
No-regret algorithms for online convex optimization are potent online learning tools and have been d...
We consider the problem of sequential decision making under uncertainty in which the loss caused by ...
Deep exploration requires coordinated long-term planning. We present a model-based reinforcement le...