International audienceIn this paper, we draw the specifications of a novel benchmark for comparing parallel processing frameworks in the context of big data applications hosted in the cloud. We aim at filling several gaps in already existing cloud data processing benchmarks, which lack a real-life context for their processes, thus losing relevance when trying to assess performance for real applications. Hence, we propose a fictitious news site hosted in the cloud that is to be managed by the framework under analysis, together with several objective use case scenarios and measures for evaluating system performance. The main strengths of our benchmark are parallelization capabilities supporting cloud features and big data properties
The volume, variety, and velocity properties of big data and the valuable information it contains ha...
Abstract—How can applications be deployed on the cloud to achieve maximum performance? This question...
The massively increasing amount of often geographically dispersed large quantities of data of experi...
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major...
The recent boom of big data, coupled with the challenges of its processing and storage gave rise to ...
Abstract—There is an increasing demand for processing tremendous volumes of data, which promotes the...
In many fields of research and business data sizes are breaking the petabyte barrier. This imposes n...
AbstractThe paper presents a workflow application for efficient parallel processing of data download...
Abstract—Today’s lightening fast data generation from massive sources is calling for efficient big d...
While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely re...
The emerging Big Data paradigm has attracted attention from a wide variety of industry sectors, incl...
Big Data systems manage and process huge volumes of data constantly generated by various technologie...
Big Data is a data analysis methodology enabled by recent advances in technologies and architecture....
Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving res...
How can applications be deployed on the cloud to achieve maximum performance? This question has beco...
The volume, variety, and velocity properties of big data and the valuable information it contains ha...
Abstract—How can applications be deployed on the cloud to achieve maximum performance? This question...
The massively increasing amount of often geographically dispersed large quantities of data of experi...
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major...
The recent boom of big data, coupled with the challenges of its processing and storage gave rise to ...
Abstract—There is an increasing demand for processing tremendous volumes of data, which promotes the...
In many fields of research and business data sizes are breaking the petabyte barrier. This imposes n...
AbstractThe paper presents a workflow application for efficient parallel processing of data download...
Abstract—Today’s lightening fast data generation from massive sources is calling for efficient big d...
While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely re...
The emerging Big Data paradigm has attracted attention from a wide variety of industry sectors, incl...
Big Data systems manage and process huge volumes of data constantly generated by various technologie...
Big Data is a data analysis methodology enabled by recent advances in technologies and architecture....
Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving res...
How can applications be deployed on the cloud to achieve maximum performance? This question has beco...
The volume, variety, and velocity properties of big data and the valuable information it contains ha...
Abstract—How can applications be deployed on the cloud to achieve maximum performance? This question...
The massively increasing amount of often geographically dispersed large quantities of data of experi...