Over the last 15 years, numerous distributed dataflow systems appeared for large-scale data analytics, such as Apache Flink and Apache Spark. Users of such systems write data analysis programs in a (more or less) high-level API, while the systems take care of the low-level details of executing the programs in a scalable way on a cluster of machines. The systems' APIs consist of distributed collection types (or distributed matrix, graph, etc. types), and corresponding parallel operations. Distributed dataflow systems work well for simple programs, which are straightforward to express by just a few of the system-provided parallel operations. However, modern data analytics often demands the composition of larger programs, where 1) parallel op...