Future computer systems will integrate tens of multithreaded processor cores on a single chip die, resulting in hundreds of concurrent program threads sharing system resources. These designs will be the cornerstone of improving through-put in high-performance computing and server environments. However, to date, appropriate systems software (operat-ing system, run-time system, and compiler) technologies for these emerging machines have not been adequately explored. Future processors will require sophisticated hardware mon-itoring units to continuously feed back resource utilization information to allow the operating system to make opti-mal thread co-scheduling decisions and also to software that continuously optimizes the program itself. Nev...