This paper describes several new online model-free reinforcement learning (RL) algorithms. We designed three new reinforcement algorithms, namely: QV2, QVMAX, and QV-MAX2, that are all based on the QV-learning algorithm, but in contrary to QV-learning, QVMAX and QVMAX2 are off-policy RL algorithms and QV2 is a new on-policy RL algorithm. We experimentally compare these algorithms to a large number of different RL algorithms, namely: Q-learning, Sarsa, R-learning, Actor-Critic, QV-learning, and ACLA. We show experiments on five maze problems of varying complexity. Furthermore, we show experimental results on the cart pole balancing problem. The results show that for different problems, there can be large performance differences between the d...