The two basic performance indices characterizing the multi-target detection task in a radar system are the probability of false alarm (PFA) and the probability of detection PD . It is well-known that, when the disturbance model (i.e., clutter and noise) is perfectly known, the Neyman-Pearson (NP) detector provides the best decision strategy, i.e., the detector that maximizes the PD , while keeping a constant PFA . However, in practical scenarios, the a priori knowledge of the statistical model of the disturbance is rarely available. In this paper we investigate the robustness of a reinforcement learning (RL) based Wald-type test to guarantee reliable detection performance even without knowledge of the disturbance distribution. Specifically,...