The difficulty of effectively parallelizing code for multicore processors, combined with the end of threshold voltage scaling has resulted in the problem of \u27Dark Silicon\u27, severely limiting performance scaling despite Moore\u27s Law. To address dark silicon, not only must we drastically improve the energy efficiency of computation, but due to Amdahl\u27s Law, we must do so without compromising sequential performance. Designers increasingly utilize custom hardware to dramatically improve both efficiency and performance in increasingly heterogeneous architectures. Unfortunately, while it efficiently accelerates numeric, data-parallel applications, custom hardware often exhibits poor performance on sequential code, so complex, power-hun...