Recent years have seen the widespread adoption of JSON as a data format to represent massive data collections managed and analysed by crucial applications. JSON data collections are usually schemaless, allowing thus for a flexible management of data. However, the absence of schema information has several disadvantages: the correctness of complex queries and programs cannot be statically checked, users have no way to figure out structural properties of the underlying data, and, more generally, schema-based optimisations cannot be applied. In this paper we deal with the problem of inferring a schema from massive JSON datasets. Our first contribution is the identification and definition of a JSON type language, which is a good co...