To meet today's data management needs, it is a widespread practice to use distributed data storage and processing systems. Since the publication of the MapReduce paradigm, a plethora of such systems arose, but although widespread, the capabilities of these systems are still poorly understood and putting them to effective use is often more of an art than a science.As one of the causes for this observation, we identify a lack of theoretical underpinnings for these systems, which makes it hard to understand what the advantages and disadvantages of the particular systems are and which, in addition, complicates the choice of a particular formalism for a particular task. In my PhD thesis, we zoom in on several important aspects of query evaluatio...
The inevitability of the relationship between big data and distributed systems is indicated by the f...
Processing and storage of a large amount of information is one of the difficult and interesting task...
MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, ...
Big data is not new to us. Many efforts are devoted to efficient and parallel query processing of b...
The growing popularity of Resource Description Framework (RDF) as a mode for data exchange and integ...
To simplify data integration and exchange, modern applications often represent their data using the...
With more and more businesses and organizations outsourcing their IT services to distributed clouds...
Big Data systems manage and process huge volumes of data constantly generated by various technologie...
Distributed data processing is becoming a reality. Businesses want to do it for many reasons, and th...
Timely and cost-effective analytics over "big data" has emerged as a key ingredient for success in m...
In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting...
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major...
Part 1: Keynote LecturesInternational audienceThe paper is focused on today’s very popular theme – B...
Scalable by design to very large computing systems such as grids and clouds, MapReduce is currently ...
Evaluating joins over RDF data stored in a shared-nothing server cluster is key to processing truly ...
The inevitability of the relationship between big data and distributed systems is indicated by the f...
Processing and storage of a large amount of information is one of the difficult and interesting task...
MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, ...
Big data is not new to us. Many efforts are devoted to efficient and parallel query processing of b...
The growing popularity of Resource Description Framework (RDF) as a mode for data exchange and integ...
To simplify data integration and exchange, modern applications often represent their data using the...
With more and more businesses and organizations outsourcing their IT services to distributed clouds...
Big Data systems manage and process huge volumes of data constantly generated by various technologie...
Distributed data processing is becoming a reality. Businesses want to do it for many reasons, and th...
Timely and cost-effective analytics over "big data" has emerged as a key ingredient for success in m...
In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting...
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major...
Part 1: Keynote LecturesInternational audienceThe paper is focused on today’s very popular theme – B...
Scalable by design to very large computing systems such as grids and clouds, MapReduce is currently ...
Evaluating joins over RDF data stored in a shared-nothing server cluster is key to processing truly ...
The inevitability of the relationship between big data and distributed systems is indicated by the f...
Processing and storage of a large amount of information is one of the difficult and interesting task...
MapReduce is a framework for processing and managing large-scale datasets in a distributed cluster, ...