In recent years rapid revolution of Multiprocessor System-on-Chip (MPSoC) poses new challenges for programming such architectures in an efficient manner. In order to ex-plore potential hardware concurrency, software developers are still expected to handle many of the low-level details of programming including utilizing DMA, ensuring cache co-herency, and inserting synchronization primitives explicitly. Software portability is yet another issue: the state-of-the-art is that hardware vendors supply vendor-specific software de-velopment toolchains which makes it harder for applications to be ported to many different possible architectures with-out re-structuring the code, while at the same time ensuring efficiency. In this paper, we extend the...