Thanks to the exponential growth of data that needs to be processed in cloud datacenters, data parallel frameworks, such as MapReduce and Spark, have emerged as foundations of cloud computing. It becomes increasingly significant to improve the performance of data analytics jobs running in a shared cluster. Towards this objective, we investigate important research problems in this dissertation as follows. Utility-Optimal Coflow Scheduling. A coflow represents a set of network flows in the communication stage of a data parallel job. The completion time of a job is determined by the collective behavior of a coflow and influenced by the amount of network bandwidth allocated to it. We focus on the design and implementation of a new utility optim...
Coflow is a recently proposed network abstraction to capture communication patterns in data centers....
Communication in data-parallel applications often involves a col-lection of parallel flows. Traditio...
Abstract—In the data flow models of today’s data center applications such as MapReduce, Spark and Dr...
Thanks to the exponential growth of data that needs to be processed in cloud datacenters, data paral...
A variety of Internet applications rely on big data analytics frameworks to efficiently process larg...
Over the past decade, the confluence of an unprecedented growth in data volumes and the rapid rise o...
This dissertation focuses on algorithm design and prototype implementation of fair sharing policies ...
Data parallel applications in data centers generate, process, and store huge volumes of data. Coflow...
© 2018 IEEE. Many datacenters usually process complex jobs such as MapReduce jobs. From a network pe...
In current data centers, an application (e.g., MapReduce, Dryad, search platform, etc.) usually gene...
Abstract — In the data flow models of today’s data center applications such as MapReduce, Spark and ...
Efficient execution of distributed database operators such as joining and aggregating is critical fo...
Efficient execution of distributed database operators such as joining and aggregating is critical fo...
Datacenters have emerged as the dominant form of computing infrastructure over the last two decades....
Data analytics frameworks enable users to process large datasets while hiding the complexity of scal...
Coflow is a recently proposed network abstraction to capture communication patterns in data centers....
Communication in data-parallel applications often involves a col-lection of parallel flows. Traditio...
Abstract—In the data flow models of today’s data center applications such as MapReduce, Spark and Dr...
Thanks to the exponential growth of data that needs to be processed in cloud datacenters, data paral...
A variety of Internet applications rely on big data analytics frameworks to efficiently process larg...
Over the past decade, the confluence of an unprecedented growth in data volumes and the rapid rise o...
This dissertation focuses on algorithm design and prototype implementation of fair sharing policies ...
Data parallel applications in data centers generate, process, and store huge volumes of data. Coflow...
© 2018 IEEE. Many datacenters usually process complex jobs such as MapReduce jobs. From a network pe...
In current data centers, an application (e.g., MapReduce, Dryad, search platform, etc.) usually gene...
Abstract — In the data flow models of today’s data center applications such as MapReduce, Spark and ...
Efficient execution of distributed database operators such as joining and aggregating is critical fo...
Efficient execution of distributed database operators such as joining and aggregating is critical fo...
Datacenters have emerged as the dominant form of computing infrastructure over the last two decades....
Data analytics frameworks enable users to process large datasets while hiding the complexity of scal...
Coflow is a recently proposed network abstraction to capture communication patterns in data centers....
Communication in data-parallel applications often involves a col-lection of parallel flows. Traditio...
Abstract—In the data flow models of today’s data center applications such as MapReduce, Spark and Dr...