In the last years, Machine Learning (ML) has become extremely used in software systems: it is applied in many different contexts such as medicine, bioinformatics, finance, automotive, only to mention a few. One of the main drawbacks recognized in the literature is that there are still no consolidated approaches and strategies to ensure the reliability of the code implementing the underlying ML theoretical algorithms. This fact has potentially a strong impact since many critical software systems rely on ML algorithms for implementing intelligent behaviors, and so on (potentially) unreliable code that could cause, in extreme cases, catastrophic errors: e.g., loss of life due to a wrong diagnosis of an ML-based cancer classifier. Our work aims...