Added A new memory reuse optimisation has been added. This results in slightly lower footprint for many programs. The cuda backend now uses a fast single-pass implementation for segmented scans, due to Morten Tychsen Clausen (#1375). futhark bench now prints interim results while it is running. Fixed futhark test now provides better error message when asked to test an undefined entry point (#1367). futhark pkg now detects some nonsensical package paths (#1364). FutharkScript now parses f x y as applying f to x and y, rather than as f (x y). Some internal array utility functions would not be generated if entry points exposed both unit arrays and boolean arrays (#1374). Nested reductions used (much) more memory for intermediate res...