Nowadays embedded systems are increasingly used in the world of distributed computing to provide more computational power without having to change the whole system and the programming model. We propose a DataFlow Execution Engine (DEE) to spawn asynchronous, data-driven threads, among embedded cores to achieve a seamless distribution of threads without the need of using a distributed programming model. Our idea relies on the creation of a hardware scheduler that can handle creation, thread-dependency, and locality of many fine-grained tasks. We present an initial evaluation of our DEE that is suited for FPGA implementation. Our initial results show the importance of a hardware based support for such thread execution model