Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcome the so-called curse of dimensionality associated to real stochastic processes. In this regard, we propose a novel Least-Squares Temporal Difference (LSTD) based method: the “Multi-trajectory Greedy LSTD” (MG-LSTD). It is an exploration-enhanced recursive LSTD algorithm with the policy improvement embedded within the LSTD iterations. It makes use of multi-trajectories Monte Carlo simulations in order to enhance the system state space exploration. This method is applied for solving resource allocation problems modeled via a constrained Stochastic Dynamic Programming (SDP) based framework. In particular, such problems are formulated as a set of...
A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs)...
Big data and the curse of dimensionality are common vocabularies that researchers in different commu...
We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize ...
Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcom...
Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcom...
Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcom...
Abstract We present modeling and solution strategies for large-scale resource allocation problems th...
This technical report is a revised and extended version of the technical report C-2010-1. It contain...
The problem of managing the price for resource allocation arises in several applications, such as pu...
Abstract We present modeling and solution strategies for large-scale resource allocation prob-lems t...
We consider finite-state Markov decision processes, and prove convergence and rate of convergence re...
We consider approximate policy evaluation for finite state and action Markov decision pro-cesses (MD...
Big data and the curse of dimensionality are common vocabularies that researchers in different commu...
Big data and the curse of dimensionality are common vocabularies that researchers in different commu...
A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs)...
A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs)...
Big data and the curse of dimensionality are common vocabularies that researchers in different commu...
We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize ...
Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcom...
Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcom...
Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcom...
Abstract We present modeling and solution strategies for large-scale resource allocation problems th...
This technical report is a revised and extended version of the technical report C-2010-1. It contain...
The problem of managing the price for resource allocation arises in several applications, such as pu...
Abstract We present modeling and solution strategies for large-scale resource allocation prob-lems t...
We consider finite-state Markov decision processes, and prove convergence and rate of convergence re...
We consider approximate policy evaluation for finite state and action Markov decision pro-cesses (MD...
Big data and the curse of dimensionality are common vocabularies that researchers in different commu...
Big data and the curse of dimensionality are common vocabularies that researchers in different commu...
A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs)...
A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs)...
Big data and the curse of dimensionality are common vocabularies that researchers in different commu...
We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize ...