International audienceThis paper considers the problem of multi-target detection for massive multiple input multiple output (MMIMO) cognitive radar (CR). The concept of CR is based on the perception-action cycle that senses and intelligently adapts to the dynamic environment in order to optimally satisfy a specific mission. However, this usually requires a priori knowledge of the environmental model, which is not available in most cases. We propose a reinforcement learning (RL) based algorithm for cognitive multi-target detection in the presence of unknown disturbance statistics. The radar acts as an agent that continuously senses the unknown environment (i.e., targets and disturbance) and consequently optimizes transmitted waveforms in ord...