The identification of switched systems is a challenging problem, which entails both combinatorial (sample-mode assignment) and continuous (parameter estimation) features. A general framework for this problem has been recently developed, which alternates between parameter estimation and sample-mode assignment, solving both tasks to global optimality under mild conditions. This article extends this framework to the nonlinear case, which further aggravates the combinatorial complexity of the identification problem, since a model structure selection task has to be addressed for each mode of the system. To solve this issue, we reformulate the learning problem in terms of the optimization of a probability distribution over the space of all possib...