This article proposes an adaptive action-selection method for a model-free reinforcement learning system, based on the concept of the ‘reliability of internal prediction/estimation’. This concept is realized using an internal variable, called the Reliability Index (RI), which estimates the accuracy of the internal estimator. We define this index for a value function of a temporal difference learning system and substitute it for the temperature parameter of the Boltzmann action-selection rule. Accordingly, the weight of exploratory actions adaptively changes depending on the uncertainty of the prediction. We use this idea for tabular and weighted-sum type value functions. Moreover, we use the RI to adjust the learning coefficient in addition...