This paper presents an efficient technique to perform design space exploration of a multiprocessor platform that minimizes the number of simulations needed to identify a Pareto curve with metrics like energy and delay. Instead of using semi-random search algorithms (like simulated annealing, tabu search, genetic algorithms, etc.), we use the domain knowledge derived from the platform architecture to set-up the exploration as a discrete-space Markov decision process. The system walks the design space changing its parameters, performing simulations only when probabilistic information becomes insufficient for a decision. A learning algorithm updates the probabilities of decision outcomes as simulations are performed. The proposed technique has...