Memory access latency is the primary performance bottle-neck in modern computer systems. Prefetching data before it is needed by a processing core allows substantial performance gains by overlapping significant portions of memory latency with useful work. Prior work has investigated this technique and measured potential benefits in a variety of scenarios. However, its use in speeding up Hardware Transactional Memory (HTM) has remained hitherto unexplored. In several HTM designs transactions invalidate speculatively updated cache lines when they abort. Such cache lines tend to have high locality and are likely to be accessed again when the transaction re-executes. Coarse grained transactions that update several cache lines are particularly s...