Applications based on unstructured meshes are typically compute intensive, leading to long running times. In princi-ple, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel program-ming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel imple-mentation of each OP2-loop and discover opportunities for optimisation. In this paper, we describe how...