In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applications such as mpiBLAST introduce a data-initializing stage to move database fragments from shared storage to local cluster nodes. Unfortunately, with the exponentially increasing size of sequence databases in today\u27s big data era, such an approach is inefficient. In this paper, we develop a scalable data access framework to solve the data movement problem for scientific applications that are dominated by read operation for data analysis. SDAFT employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: (1) a data centric load-balanced scheduler (DC-s...
Scientific applications at exascale generate and analyze massive amounts of data. A critical require...
Emerging scientific workflows in high performance computing (HPC) focus more on analysis rather than...
One of challenges brought by large-scale scientific ap-plications is how to avoid remote storage acc...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
Large-scale scientific applications typically write their data to parallel file systems with organiz...
Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many para...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...
Scientific experiments and large-scale simulations produce massive amounts of data. Many of these sc...
To facilitate big data processing, many dedicated data-intensive storage systems such as Google File...
Large data stores are pushing the limits of modern technology. Parallel file systems provide high I/...
Data producers typically optimize the layout of data files to minimize the write time. In most cases...
The distributed file system, HDFS, is widely deployed as the bedrock for many parallel big data anal...
Scientific applications at exascale generate and analyze massive amounts of data. A critical require...
Emerging scientific workflows in high performance computing (HPC) focus more on analysis rather than...
One of challenges brought by large-scale scientific ap-plications is how to avoid remote storage acc...
In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applicat...
Large-scale scientific applications typically write their data to parallel file systems with organiz...
Recent years the Hadoop Distributed File System(HDFS) has been deployed as the bedrock for many para...
Whereas traditional scientific applications are computationally intensive, recent applications requi...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Many scientific applications have large I/O requirements, in terms of both the size of data and the ...
Abstract—We seek to enable efficient large-scale parallel exe-cution of applications in which a shar...
Scientific experiments and large-scale simulations produce massive amounts of data. Many of these sc...
To facilitate big data processing, many dedicated data-intensive storage systems such as Google File...
Large data stores are pushing the limits of modern technology. Parallel file systems provide high I/...
Data producers typically optimize the layout of data files to minimize the write time. In most cases...
The distributed file system, HDFS, is widely deployed as the bedrock for many parallel big data anal...
Scientific applications at exascale generate and analyze massive amounts of data. A critical require...
Emerging scientific workflows in high performance computing (HPC) focus more on analysis rather than...
One of challenges brought by large-scale scientific ap-plications is how to avoid remote storage acc...