The Optimal Transport theory not only defines a notion of distance between probability measures, but can also align two distributions. During the past few years, Optimal Transport found many applications in Machine Learning, such as the approximation of a distribution in GANs for the generation of new points or the adaptation of labeled source data to unlabeled target examples to solve transfer learning tasks. Given a ground metric that allows to compare two points of a vector space, the transport between distributions is said to be optimal when it minimizes the global cost for moving one distribution to another. Although often difficult to compute, the corresponding Wasserstein distance intuitively generalizes the usual metrics between poi...