As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbench-marks and real-world graph processing queries, in datasets that are ...
This talk is about sharing our recent experiences in providing data analytics platform based on Apac...
The constant flux of data and queries alike has been pushing the boundaries of data analysis systems...
As the Web of Data is growing at an ever increasing speed, the lack of reliable query solutions for ...
As data science gets deployed more and more into operational applications, it becomes important for ...
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. T...
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. T...
Through new digital business models, the importance of big data analytics continuously grows. Initia...
International audienceA growing number of domains (finance, seismology, internet-of-things, etc.) co...
Modern data analysis is undergoing a ``Big Data'' transformation: organizations are generating and g...
Field of study: Computer science.Dr. Chi-Ren Shyu, Thesis Supervisor."May 2017."[ACCESS RESTRICTED T...
Low-latency, high-throughput systems for serving interactive queries are crucial to today's web serv...
The sheer increase in the volume of data over the last decade has triggered research in cluster comp...
Modern scientific datasets present numerous data management and analysis challenges. State-of-the- a...
Abstract. The intensive research activity in analysis of social media and micro-blogging data in rec...
This talk is about sharing our recent experiences in providing data analytics platform based on Apac...
The constant flux of data and queries alike has been pushing the boundaries of data analysis systems...
As the Web of Data is growing at an ever increasing speed, the lack of reliable query solutions for ...
As data science gets deployed more and more into operational applications, it becomes important for ...
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. T...
Powerful abstractions such as dataframes are only as efficient as their underlying runtime system. T...
Through new digital business models, the importance of big data analytics continuously grows. Initia...
International audienceA growing number of domains (finance, seismology, internet-of-things, etc.) co...
Modern data analysis is undergoing a ``Big Data'' transformation: organizations are generating and g...
Field of study: Computer science.Dr. Chi-Ren Shyu, Thesis Supervisor."May 2017."[ACCESS RESTRICTED T...
Low-latency, high-throughput systems for serving interactive queries are crucial to today's web serv...
The sheer increase in the volume of data over the last decade has triggered research in cluster comp...
Modern scientific datasets present numerous data management and analysis challenges. State-of-the- a...
Abstract. The intensive research activity in analysis of social media and micro-blogging data in rec...
This talk is about sharing our recent experiences in providing data analytics platform based on Apac...
The constant flux of data and queries alike has been pushing the boundaries of data analysis systems...
As the Web of Data is growing at an ever increasing speed, the lack of reliable query solutions for ...