All the supercomputers in the world exploit data-level parallelism (DLP), for example by using single instructions to operate over several data elements. Improving vector processing is therefore key for exascale computing. Control flow divergence is one of the main vector performance limiting factors. Most modern vector instruction sets rely on predication to support divergence control. Nevertheless, the performance and energy consumption in predicated codes is usually insensitive to the number of active elements. Since the trend is that vector register size doubles every four years, the energy efficiency of exascale systems will become sub-optimal. This paper proposes the Compiler-Assisted Compaction/Restoration (CACR) technique. The basel...