Data producers typically optimize the layout of data files to minimize the write time. In most cases, data analysis tasks read these files in access patterns different from the write patterns causing poor read performance. In this paper, we introduce Scientific Data Services (SDS), a framework for bridging the performance gap between writing and reading scientific data. SDS reorganizes data to match the read patterns of analysis tasks and enables transparent data reads from the reorganized data. We implemented a HDF5 Virtual Object Layer (VOL) plugin to redirect the HDF5 dataset read calls to the reorganized data. To demonstrate the effectiveness of SDS, we applied two parallel data organization techniques: a sort-based organization on a pl...
Many scientific applications are I/O intensive and gen-erate large data sets, spanning hundreds or t...
Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many para...
Modern scientific datasets present numerous data management and analysis challenges. State-of-the- a...
Data producers typically optimize the layout of data files to minimize the write time. In most cases...
Abstract—Data producers typically optimize the layout of data files to minimize the write time. In m...
Large-scale scientific applications typically write their data to par-allel file systems with organi...
Abstract—Scientific experiments and simulations produce mountains of data in file formats, such as H...
Abstract—Performance of reading scientific data from a parallel file system depends on the organizat...
Scientific experiments and large-scale simulations produce massive amounts of data. Many of these sc...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
Visualization is a highly data intensive science: visualization algorithms take as input vast amount...
As scientific simulations and experiments move toward extremely large scales and generate massive am...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Many scientific applications are I/O intensive and gen-erate large data sets, spanning hundreds or t...
Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many para...
Modern scientific datasets present numerous data management and analysis challenges. State-of-the- a...
Data producers typically optimize the layout of data files to minimize the write time. In most cases...
Abstract—Data producers typically optimize the layout of data files to minimize the write time. In m...
Large-scale scientific applications typically write their data to par-allel file systems with organi...
Abstract—Scientific experiments and simulations produce mountains of data in file formats, such as H...
Abstract—Performance of reading scientific data from a parallel file system depends on the organizat...
Scientific experiments and large-scale simulations produce massive amounts of data. Many of these sc...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
Visualization is a highly data intensive science: visualization algorithms take as input vast amount...
As scientific simulations and experiments move toward extremely large scales and generate massive am...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Many scientific applications are I/O intensive and gen-erate large data sets, spanning hundreds or t...
Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many para...
Modern scientific datasets present numerous data management and analysis challenges. State-of-the- a...