ABSTRACT Massive data analysis in cloud-scale data centers plays a crucial role in making critical business decisions. Highlevel scripting languages free developers from understanding various system trade-offs, but introduce new challenges for query optimization. One key optimization challenge is missing accurate data statistics, typically due to massive data volumes and their distributed nature, complex computation logic, and frequent usage of user-defined functions. In this paper we propose novel techniques to adapt query processing in the Scope system, the cloud-scale computation environment in Microsoft Online Services. We continuously monitor query execution, collect actual runtime statistics, and adapt parallel execution plans as the ...
International audienceCloud computing has become a widely used environment for database querying. In...
Cloud-based data analysis is nowadays common practice because of the lower system management overhea...
Big data systems such as relational databases, data science platforms, and scientific workflows all ...
In the last decade, the world wide web has grown from being a platform where users passively viewed ...
ABSTRACT Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain act...
The last decade witnessed the emergence of various dis-tributed storage and computation systems for ...
With more and more businesses and organizations outsourcing their IT services to distributed clouds...
This thesis addresses a fundamental data management challenge faced by cloud service providers: anal...
In online aggregation, a database system processes a user’s aggre-gation query in an online fashion....
A major component of many cloud services is query processing on data stored in the underlying cloud ...
Streaming analytics applications need to process massive volumes of data in a timely manner, in doma...
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware ...
Abstract—As the size of data set in cloud increases rapidly, how to process large amount of data eff...
In the recent years, large-scale data analysis has become critical to the success of modern enterpri...
Over the past decade, a number of data intensive scalable systems have been developed to process ext...
International audienceCloud computing has become a widely used environment for database querying. In...
Cloud-based data analysis is nowadays common practice because of the lower system management overhea...
Big data systems such as relational databases, data science platforms, and scientific workflows all ...
In the last decade, the world wide web has grown from being a platform where users passively viewed ...
ABSTRACT Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain act...
The last decade witnessed the emergence of various dis-tributed storage and computation systems for ...
With more and more businesses and organizations outsourcing their IT services to distributed clouds...
This thesis addresses a fundamental data management challenge faced by cloud service providers: anal...
In online aggregation, a database system processes a user’s aggre-gation query in an online fashion....
A major component of many cloud services is query processing on data stored in the underlying cloud ...
Streaming analytics applications need to process massive volumes of data in a timely manner, in doma...
The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware ...
Abstract—As the size of data set in cloud increases rapidly, how to process large amount of data eff...
In the recent years, large-scale data analysis has become critical to the success of modern enterpri...
Over the past decade, a number of data intensive scalable systems have been developed to process ext...
International audienceCloud computing has become a widely used environment for database querying. In...
Cloud-based data analysis is nowadays common practice because of the lower system management overhea...
Big data systems such as relational databases, data science platforms, and scientific workflows all ...