Cloud computing is increasingly attracting huge attention both in academic research and industry initiatives and has been widely used to solve advanced computation problem. As cloud datacentres continue to grow in scale and complexity, the risk of failure of Virtual Machines (VM) and hosts running several jobs and processing large amount of user request increases and consequently becomes even more difficult to predict potential failures within a datacentre. However, even though fault tolerance continues to be an issue of growing concern in cloud and HPC systems, mitigating the impact of failure and providing accurate predictions with enough lead time remains a difficult research problem. Traditional existing fault-tolerance strategies such ...