For most relevant computation, the energy and time needed for data movement dominates that for performing arithmetic operations on all computing systems today. Hence it is of critical importance to understand the minimal total data movement achievable during the execution of an algorithm. The achieved total data movement for different schedules of an algorithm can vary widely depending on how efficiently the cache is used, e.g., untiled versus effectively tiled matrix-matrix multiplication. A significant current challenge is that no existing tool is able to meaningfully quantify the potential reduction to the data movement of a computation that can be achieved by more effective use of the cache through operation rescheduling. Asymptotic par...
The replacement policies known as MIN and OPT are optimal for a two-level memory hierarchy. The comp...
We describe a new automatic static analysis for determining upper-bound functions on the use of quan...
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorit...
For most relevant computation, the energy and time needed for data movement dominates that for perfo...
International audienceResearchers and practitioners have for long worked on improving the computatio...
International audienceEvaluating the complexity of an algorithm is an important step when developing...
Evaluating the complexity of an algorithm is an important step when developing applications,as it im...
Using a directed acyclic graph (dag) model of algorithms, we solve a problem related to precedence-c...
International audienceArray contraction is a compilation optimization used to reduce memory consumpt...
In the directed acyclic graph (dag) model of algorithms, consider the following problem for preceden...
International audienceTechnology trends are making the cost of data movement increasingly dominant, ...
International audienceTechnology trends will cause data movement to account for the majorityof energ...
The movement of data (communication) between levels of a memory hierarchy, or between parallel proce...
International audienceThe roofline model is a popular approach to ``bounds and bottleneck''performan...
AbstractThis paper establishes time-space tradeoffs for some algebraic problems in the branching pro...
The replacement policies known as MIN and OPT are optimal for a two-level memory hierarchy. The comp...
We describe a new automatic static analysis for determining upper-bound functions on the use of quan...
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorit...
For most relevant computation, the energy and time needed for data movement dominates that for perfo...
International audienceResearchers and practitioners have for long worked on improving the computatio...
International audienceEvaluating the complexity of an algorithm is an important step when developing...
Evaluating the complexity of an algorithm is an important step when developing applications,as it im...
Using a directed acyclic graph (dag) model of algorithms, we solve a problem related to precedence-c...
International audienceArray contraction is a compilation optimization used to reduce memory consumpt...
In the directed acyclic graph (dag) model of algorithms, consider the following problem for preceden...
International audienceTechnology trends are making the cost of data movement increasingly dominant, ...
International audienceTechnology trends will cause data movement to account for the majorityof energ...
The movement of data (communication) between levels of a memory hierarchy, or between parallel proce...
International audienceThe roofline model is a popular approach to ``bounds and bottleneck''performan...
AbstractThis paper establishes time-space tradeoffs for some algebraic problems in the branching pro...
The replacement policies known as MIN and OPT are optimal for a two-level memory hierarchy. The comp...
We describe a new automatic static analysis for determining upper-bound functions on the use of quan...
Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorit...