As a distributed data-parallelization (DDP) pattern, MapReduce has been adopted by many new big data analysis tools to achieve good scalability and performance in Cluster or Cloud environments. This paper explores how two binary DDP patterns, i.e., CoGroup and Match, could also be used in these tools. We re-implemented an existing bioinformatics tool, called CloudBurst, with three different DDP pattern combinations. We identify two factors, namely, input data balancing and value sparseness, which could greatly affect the performances using different DDP patterns. Our experiments show: (i) a simple DDP pattern switch could speed up performance by almost two times; (ii) the identified factors can explain the differences well. Categories and S...
Big Data systems manage and process huge volumes of data constantly generated by various technologie...
The next-generation sequencing instruments enable biological researchers to generate voluminous amou...
AbstractWith the development of computer technology, there is a tremendous increase in the growth of...
Part 4: Big Data+CloudInternational audienceGreat efforts have been made on meta-genomics in the fie...
[[abstract]]Mining with big data or big data mining has become an active research area. It is very d...
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major...
The volume, variety, and velocity properties of big data and the valuable information it contains ha...
Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in th...
In the field of biology, researchers need to compare genes or gene products using semantic similarit...
Generalizable approaches, models, and frameworks for irregular application scalability is an old yet...
Clustering is defined as the process of grouping a set of objects in a way that objects in the same ...
One of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of pa...
One of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of pa...
AbstractIn the big data era, the need for fast robust machine learning techniques is rapidly increas...
Abstract—Big data is the process of handling large datasets. In today’s scenario, data is growing ex...
Big Data systems manage and process huge volumes of data constantly generated by various technologie...
The next-generation sequencing instruments enable biological researchers to generate voluminous amou...
AbstractWith the development of computer technology, there is a tremendous increase in the growth of...
Part 4: Big Data+CloudInternational audienceGreat efforts have been made on meta-genomics in the fie...
[[abstract]]Mining with big data or big data mining has become an active research area. It is very d...
With Cloud Computing emerging as a promising new approach for ad-hoc parallel data processing, major...
The volume, variety, and velocity properties of big data and the valuable information it contains ha...
Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in th...
In the field of biology, researchers need to compare genes or gene products using semantic similarit...
Generalizable approaches, models, and frameworks for irregular application scalability is an old yet...
Clustering is defined as the process of grouping a set of objects in a way that objects in the same ...
One of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of pa...
One of the solutions to enable scalable 'big data' analysis and analytics is to take advantage of pa...
AbstractIn the big data era, the need for fast robust machine learning techniques is rapidly increas...
Abstract—Big data is the process of handling large datasets. In today’s scenario, data is growing ex...
Big Data systems manage and process huge volumes of data constantly generated by various technologie...
The next-generation sequencing instruments enable biological researchers to generate voluminous amou...
AbstractWith the development of computer technology, there is a tremendous increase in the growth of...