In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applications such as mpiBLAST introduce a data-initializing stage to move database fragments from shared storage to local cluster nodes. Unfortunately, with the exponentially increasing size of sequence databases in today\u27s big data era, such an approach is inefficient. In this paper, we develop a scalable data access framework to solve the data movement problem for scientific applications that are dominated by read operation for data analysis. SDAFT employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: (1) a data centric load-balanced scheduler (DC-s...
Large data stores are pushing the limits of modern technology. Parallel file systems provide high I/...
Scientific applications at exascale generate and analyze massive amounts of data. A critical require...
Abstract—Data producers typically optimize the layout of data files to minimize the write time. In m...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
Large-scale scientific applications typically write their data to parallel file systems with organiz...
Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many para...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...
Scientific experiments and large-scale simulations produce massive amounts of data. Many of these sc...
To facilitate big data processing, many dedicated data-intensive storage systems such as Google File...
The distributed file system, HDFS, is widely deployed as the bedrock for many parallel big data anal...
Data producers typically optimize the layout of data files to minimize the write time. In most cases...
Large data stores are pushing the limits of modern technology. Parallel file systems provide high I/...
Scientific applications at exascale generate and analyze massive amounts of data. A critical require...
Abstract—Data producers typically optimize the layout of data files to minimize the write time. In m...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
Large-scale scientific applications typically write their data to parallel file systems with organiz...
Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many para...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...
Scientific experiments and large-scale simulations produce massive amounts of data. Many of these sc...
To facilitate big data processing, many dedicated data-intensive storage systems such as Google File...
The distributed file system, HDFS, is widely deployed as the bedrock for many parallel big data anal...
Data producers typically optimize the layout of data files to minimize the write time. In most cases...
Large data stores are pushing the limits of modern technology. Parallel file systems provide high I/...
Scientific applications at exascale generate and analyze massive amounts of data. A critical require...
Abstract—Data producers typically optimize the layout of data files to minimize the write time. In m...