In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic pol-icy gradient has a particularly appealing form: it is the expected gradient of the action-value func-tion. This simple form means that the deter-ministic policy gradient can be estimated much more efficiently than the usual stochastic pol-icy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counter-parts in high-dimensional action spaces. 1