The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochastic rewards. Each arm generates a vector of rewards instead of a single reward and these multiple rewards might be conflicting. The agent has a set of optimal arms and the agent's goal is not only finding the optimal arms, but also playing them fairly. To find the optimal arm set, the agent uses a linear scalarized (LS) function which converts the multi-objective arms into one-objective arms. LS function is simple, however it can not find all the optimal arm set. As a result, we extend knowledge gradient (KG) policy to LS function. We propose two variants of linear scalarized-KG, LS-KG across arms and dimensions. We experimentally compare the ...