Big data analytics frameworks, such as Spark and Giraph, need to process and cache massive amounts of data that do not always fit on the managed heap. Therefore, frameworks temporarily move long-lived objects outside the managed heap (off-heap) on a fast storage device. However, this practice results in (1) high serialization/deserialization (S/D) cost and (2) high memory pressure when off-heap objects are moved back to the heap for processing. In this paper, we propose TeraHeap, a system that eliminates S/D overhead and expensive GC scans for a large portion of the objects in big data frameworks. TeraHeap relies on three concepts. (1) It eliminates S/D cost by extending the managed runtime (JVM) to use a second high-capacity heap (H2) ove...
Planning optimized memory management is critical for Big Data analysis tools to perform faster runti...
Over the past decade, the increasing demands on data-driven busi-ness intelligence have led to the p...
Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but imposes per...
Big data frameworks, such as Spark and Giraph, suffer from high memory pressure because they allocat...
Many Big Data analytics and IoT scenarios rely on fast and non-relational storage (NoSQL) to help pr...
The past decade has witnessed the increasing demands on data-driven business intelligence that led t...
A proliferation of frameworks have emerged to handle the challenges of making distributed computatio...
Big Data systems have been used for multiple years to solve problems that require scale. A framework...
Large-scale data analytical applications such as social network analysis and web analysis have revol...
Many popular systems for processing “big data ” are im-plemented in high-level programming languages...
Many popular systems for processing “big data ” are im-plemented in high-level programming languages...
Many popular systems for processing “big data ” are im-plemented in high-level programming languages...
Sheer increase in volume of data over the last decade has triggered research in cluster computing fr...
The sheer increase in the volume of data over the last decade has triggered research in cluster comp...
This is a post-peer-review, pre-copyedit version of an article published in Journal of Parallel and ...
Planning optimized memory management is critical for Big Data analysis tools to perform faster runti...
Over the past decade, the increasing demands on data-driven busi-ness intelligence have led to the p...
Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but imposes per...
Big data frameworks, such as Spark and Giraph, suffer from high memory pressure because they allocat...
Many Big Data analytics and IoT scenarios rely on fast and non-relational storage (NoSQL) to help pr...
The past decade has witnessed the increasing demands on data-driven business intelligence that led t...
A proliferation of frameworks have emerged to handle the challenges of making distributed computatio...
Big Data systems have been used for multiple years to solve problems that require scale. A framework...
Large-scale data analytical applications such as social network analysis and web analysis have revol...
Many popular systems for processing “big data ” are im-plemented in high-level programming languages...
Many popular systems for processing “big data ” are im-plemented in high-level programming languages...
Many popular systems for processing “big data ” are im-plemented in high-level programming languages...
Sheer increase in volume of data over the last decade has triggered research in cluster computing fr...
The sheer increase in the volume of data over the last decade has triggered research in cluster comp...
This is a post-peer-review, pre-copyedit version of an article published in Journal of Parallel and ...
Planning optimized memory management is critical for Big Data analysis tools to perform faster runti...
Over the past decade, the increasing demands on data-driven busi-ness intelligence have led to the p...
Hadoop provides a scalable solution on traditional cluster-based Big Data platforms but imposes per...