National audienceThis paper proposes an index based learning algorithm for the opportunistic spectrum access (OSA) scenario modeled as a Markov multi-armed bandit (MAB) problem. The proposed algorithm selects a channel for transmission which is optimal not only in terms of data rate, but in terms of quality as well, i.e. signal to noise ratio (SNR). It allows secondary users (SUs) to give appropriate weight to their desired criterion, such as channel quality, which lead to reliable transmission with lower power, and data rate, by selecting two distinguishable exploration coefficients. In cognitive radio context, we numerically compare the proposed policy with an existing UCB1 and also show that it outperforms traditional UCB1 in terms of tr...