Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. The de-facto distributed data processing framework, Apache Spark, is poorly suited for the modern cloud-based data-science workloads due to its outdated assumptions: static datasets analyzed using coarse-grained transformations. In this paper, we introduce the Indexed DataFrame, an in-memory cache that supports a dataframe abstraction which incorporates indexing capabilities to support fast lookup and join operations. Moreover, it supports appends with multi-version concurrency control. We implement the Indexed DataFrame as a lightweight, standalone library which can be integrated with minimum effort in existing Spark programs. We analyze the ...
While cluster computing frameworks are contin-uously evolving to provide real-time data analysis cap...
The past few years have seen a major change in computing systems, as growing data volumes and stalli...
© 2017 Conference on Innovative Data Systems Research (CIDR). All rights reserved. Modern analytics ...
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. T...
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. T...
As data science gets deployed more and more into operational applications, it becomes important for ...
As data science gets deployed more and more into operational applications, it becomes important for ...
The sheer increase in the volume of data over the last decade has triggered research in cluster comp...
Thanks to its RDataFrame interface, ROOT now supports the execution of the same physics analysis cod...
In last decade, data analytics have rapidly progressed from traditional disk-based processing to mod...
Many analytic applications built on Hadoop ecosystem have a propensity to iteratively perform repeti...
Sheer increase in volume of data over the last decade has triggered research in cluster computing fr...
While cluster computing frameworks are continuously evolving to provide real-time data analysis capa...
The proliferation of big-data processing platforms has already led to radically different system des...
As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly ...
While cluster computing frameworks are contin-uously evolving to provide real-time data analysis cap...
The past few years have seen a major change in computing systems, as growing data volumes and stalli...
© 2017 Conference on Innovative Data Systems Research (CIDR). All rights reserved. Modern analytics ...
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. T...
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. T...
As data science gets deployed more and more into operational applications, it becomes important for ...
As data science gets deployed more and more into operational applications, it becomes important for ...
The sheer increase in the volume of data over the last decade has triggered research in cluster comp...
Thanks to its RDataFrame interface, ROOT now supports the execution of the same physics analysis cod...
In last decade, data analytics have rapidly progressed from traditional disk-based processing to mod...
Many analytic applications built on Hadoop ecosystem have a propensity to iteratively perform repeti...
Sheer increase in volume of data over the last decade has triggered research in cluster computing fr...
While cluster computing frameworks are continuously evolving to provide real-time data analysis capa...
The proliferation of big-data processing platforms has already led to radically different system des...
As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly ...
While cluster computing frameworks are contin-uously evolving to provide real-time data analysis cap...
The past few years have seen a major change in computing systems, as growing data volumes and stalli...
© 2017 Conference on Innovative Data Systems Research (CIDR). All rights reserved. Modern analytics ...