Wide-issue processors continue to achieve higher performance by exploiting greater instruction-level par-allelism. Dynamic techniques such as out-of-order execution and hardware speculation have proven effective at increasing instruction throughput. Run-time optimization promises to provide an even higher level of per-formance by adaptively applying aggressive code transformations on a larger scope. This paper presents a new hardware mechanism for generating and deploying run-time optimized code. The mechanism can be viewed as a filtering system, that resides in the retirement stage of the processor pipeline, accepts an instruction ex-ecution stream as input, and produces instruction profiles and sets of linked, optimized traces as output. ...