Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in a variety of applications. An emerging class of shared memory multiprocessors are nonuniform memory access machines with private caches and a cache coherence protocol. Proposed hardware optimizations to CC-NUMA machines can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs for each of four proposed optimizations: release consistency, adaptive sequential prefetching, migratory sharing detection, and hybrid update/invalidate with a write cache. The four optimizations differ with respect to which application features they attack, what...