International audienceMemory disambiguation mechanisms, coupled with load/store queues in out-of-order processors, are crucial to increase instruction level parallelism (ILP), especially for memory-bound scientific codes. Designing ideal memory disambiguation mechanisms is too complex because it would require precise address bits comparators; thus, modern microprocessors implement simplified and imprecise ones that perform only partial address comparisons. In this paper, we study the impact of such simplifications on the sustained performance of some real processors such that Alpha 21264, Power 4 and Itanium 2. Despite all the advanced features of these processors, we demonstrate in this article that memory address disambiguation mechanisms...
One of the main performance bottlenecks of processors today is the discrepancy between processor and...
Current microprocessors exploit high levels of instruction-level parallelism (ILP). This thesis pres...
Journal ArticleModern superscalar processors use wide instruction issue widths and out-of-order exe...
International audienceMemory disambiguation mechanisms, coupled with load/store queues in out-of-ord...
International audienceTo keep up with a large degree of instruction level parallelism (ILP), the Ita...
One of the main challenges of modern processor designs is the implementation of scalable and efficie...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
PhD ThesisCurrent microprocessors improve performance by exploiting instruction-level parallelism (I...
Current microprocessors improve performance by exploiting instruction-level parallelism (ILP). ILP h...
Hardware accelerators are an energy efficient alternative to general purpose processors for specific...
One of the main challenges of modern processor designs is the implementation of scalable and efficie...
International audienceTo keep up with a large degree of ILP, Itanium2 L2 cache system uses a complex...
Journal PaperCurrent microprocessors incorporate techniques to exploit instruction-level parallelism...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
This paper describes several methods for improving the scalability of memory disambiguation hardware...
One of the main performance bottlenecks of processors today is the discrepancy between processor and...
Current microprocessors exploit high levels of instruction-level parallelism (ILP). This thesis pres...
Journal ArticleModern superscalar processors use wide instruction issue widths and out-of-order exe...
International audienceMemory disambiguation mechanisms, coupled with load/store queues in out-of-ord...
International audienceTo keep up with a large degree of instruction level parallelism (ILP), the Ita...
One of the main challenges of modern processor designs is the implementation of scalable and efficie...
In high-end processors, increasing the number of in-flight instructions can improve performance by o...
PhD ThesisCurrent microprocessors improve performance by exploiting instruction-level parallelism (I...
Current microprocessors improve performance by exploiting instruction-level parallelism (ILP). ILP h...
Hardware accelerators are an energy efficient alternative to general purpose processors for specific...
One of the main challenges of modern processor designs is the implementation of scalable and efficie...
International audienceTo keep up with a large degree of ILP, Itanium2 L2 cache system uses a complex...
Journal PaperCurrent microprocessors incorporate techniques to exploit instruction-level parallelism...
Processor performance is directly impacted by the latency of the memory system. As processor core cy...
This paper describes several methods for improving the scalability of memory disambiguation hardware...
One of the main performance bottlenecks of processors today is the discrepancy between processor and...
Current microprocessors exploit high levels of instruction-level parallelism (ILP). This thesis pres...
Journal ArticleModern superscalar processors use wide instruction issue widths and out-of-order exe...