This paper presents a hybrid approach to automatic parallelization of computer programs which combines static extraction of threads (tasks) with dynamic scheduling for parallel and distributed execution. Fine-grain scheduling decisions are made at compile time, and coarse-grain scheduling decisions are made at run time. The approach consists of two components: compiler technology which performs the static analysis (thread extraction), and an architecture which takes over the responsibility for scheduling and distributing the threads. Each processor is augmented with a broker, whose responsibility it is to shop for tasks for the processor to perform. This approach aims to provide an adaptive run-time distribution of computation for irregular...
Performance characteristics of irregular programs on parallel architectures were studied. Results in...
High Level Synthesis (HLS) provides a way to significantly enhance the productivity of embedded syst...
In this paper, we present a relatively primitive execution model for fine-grain parallelism, in whic...
This paper presents a hybrid approach to automatic parallelization of computer programs which combin...
Today’s embedded systems depend on the availability of hybrid platforms, that contain heterogeneous ...
Modern designs for embedded systems are increasingly embracing cluster-based architectures, where sm...
The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in seque...
This paper addresses the problem of load balancing data-parallel computations on heterogeneous and t...
Abstract: In this work we present the analysis, on a dynamic processor allocation environment, of fo...
We propose in this paper a thread-based software synthesis technique to reduce communication overhea...
This study explores the design space of thread scheduler on the resource-constrained embedded run-ti...
Due to energy constraints, high performance computing platforms are becoming increasingly heterogene...
Efficiently scheduling parallel tasks onto the processors of a multiprocessor system is critical to ...
. In this paper we present Dynamic Bisectioning or DBS, a simple but powerful comprehensive scheduli...
With the current trend of multiprocessor machines towards more and more hierarchical architectures, ...
Performance characteristics of irregular programs on parallel architectures were studied. Results in...
High Level Synthesis (HLS) provides a way to significantly enhance the productivity of embedded syst...
In this paper, we present a relatively primitive execution model for fine-grain parallelism, in whic...
This paper presents a hybrid approach to automatic parallelization of computer programs which combin...
Today’s embedded systems depend on the availability of hybrid platforms, that contain heterogeneous ...
Modern designs for embedded systems are increasingly embracing cluster-based architectures, where sm...
The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in seque...
This paper addresses the problem of load balancing data-parallel computations on heterogeneous and t...
Abstract: In this work we present the analysis, on a dynamic processor allocation environment, of fo...
We propose in this paper a thread-based software synthesis technique to reduce communication overhea...
This study explores the design space of thread scheduler on the resource-constrained embedded run-ti...
Due to energy constraints, high performance computing platforms are becoming increasingly heterogene...
Efficiently scheduling parallel tasks onto the processors of a multiprocessor system is critical to ...
. In this paper we present Dynamic Bisectioning or DBS, a simple but powerful comprehensive scheduli...
With the current trend of multiprocessor machines towards more and more hierarchical architectures, ...
Performance characteristics of irregular programs on parallel architectures were studied. Results in...
High Level Synthesis (HLS) provides a way to significantly enhance the productivity of embedded syst...
In this paper, we present a relatively primitive execution model for fine-grain parallelism, in whic...