Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems

Markus Wittmann
Georg Hager

Publication date

January 2010

Abstract

Task parallelism as employed by the OpenMP task construct or some Intel Threading Building Blocks (TBB) components, al-though ideal for tackling irregular problems or typical produc-er/consumer schemes, bears some potential for performance bottlenecks if locality of data access is important, which is typically the case for memory-bound code on ccNUMA sys-tems. We present a thin software layer ameliorates adverse ef-fects of dynamic task distribution by sorting tasks into locality queues, each of which is preferably processed by threads that belong to the same locality domain. Dynamic scheduling is fully preserved inside each domain, and is preferred over pos-sible load imbalance even if nonlocal access is required, mak-ing this strategy wel...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems

Abstract

Extracted data

Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems

Abstract

Extracted data

Related items

Related items