This thesis focuses on scalable data management solutions to accelerate service provisioning and enable efficient execution of data-intensive applications in large-scale distributed clouds. Data-intensive applications are increasingly running on distributed infrastructures (multiple clusters). The main two reasons for such a trend are 1) moving computation to data sources can eliminate the latency of data transmission, and 2) storing data on one site may not be feasible given the continuous increase of data size.On the one hand, most applications run on virtual clusters to provide isolated services, and require virtual machine images (VMIs) or container images to provision such services. Hence, it is important to enable fast provisioning of...