This paper proposes a novel algorithm for solving discrete online learning prob-lems under stochastic constraints, where the leaner aims to maximize the cumu-lative reward given that some additional constraints on the sequence of decisions need to be satisfied on average. We propose Lagrangian exponentially weighted average (LEWA) algorithm, which is a primal-dual variant of the well known ex-ponentially weighted average algorithm, and inspired by the theory of Lagrangian method in constrained optimization. We establish expected and high probability bounds on the regret and the violation of the constraint in full information and bandit feedback models for LEWA algorithm.
Most work on sequential learning assumes a fixed set of actions that are available all the time. How...
In this research we study some online learning algorithms in the online convex optimization framewor...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
In this paper we consider online learning in fi-nite Markov decision processes (MDPs) with changing ...
The greedy algorithm is extensively studied in the field of combinatorial optimiza-tion for decades....
In this paper, we consider online continuous DR-submodular maximization with linear stochastic long-...
We develop an online actor-critic reinforcement learning algorithm with function approximation for a...
Thesis (Master's)--University of Washington, 2020In this thesis, we consider online continuous DR-su...
Reinforcement learning deals with the problem of sequential decision making in uncertain stochastic ...
In this paper, we investigate the power of online learning in stochastic network optimization with u...
In this paper, we combine optimal control theory and machine learning techniques to propose and solv...
This paper develops a methodology for regret minimization with stochastic first-order oracle feedbac...
Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In...
Online optimization with multiple budget constraints is challenging since the online decisions over ...
We present methods for online linear optimization that take advantage of benign (as opposed to worst...
Most work on sequential learning assumes a fixed set of actions that are available all the time. How...
In this research we study some online learning algorithms in the online convex optimization framewor...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
In this paper we consider online learning in fi-nite Markov decision processes (MDPs) with changing ...
The greedy algorithm is extensively studied in the field of combinatorial optimiza-tion for decades....
In this paper, we consider online continuous DR-submodular maximization with linear stochastic long-...
We develop an online actor-critic reinforcement learning algorithm with function approximation for a...
Thesis (Master's)--University of Washington, 2020In this thesis, we consider online continuous DR-su...
Reinforcement learning deals with the problem of sequential decision making in uncertain stochastic ...
In this paper, we investigate the power of online learning in stochastic network optimization with u...
In this paper, we combine optimal control theory and machine learning techniques to propose and solv...
This paper develops a methodology for regret minimization with stochastic first-order oracle feedbac...
Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In...
Online optimization with multiple budget constraints is challenging since the online decisions over ...
We present methods for online linear optimization that take advantage of benign (as opposed to worst...
Most work on sequential learning assumes a fixed set of actions that are available all the time. How...
In this research we study some online learning algorithms in the online convex optimization framewor...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...