Abstract—In this paper, we present the design and implementation of an application-layer data throughput prediction and optimization service for many-task computing in widely distributed environments. This service uses multiple parallel TCP streams to improve the end-to-end throughput of data transfers. A novel mathematical model is developed to decide the number of parallel streams to achieve best performance. This model can predict the optimal number of parallel streams with as few as three prediction points. We implement this new service in the Stork data scheduler, where the prediction points can be obtained using Iperf and GridFTP samplings. Our results show that the prediction cost plus the optimized transfer time is much less than th...
Accurate workload prediction and throughput estimation are keys in efficient proactive power and per...
Current scientific applications have been producing large amounts of data. The processing, handling ...
The computational grid is becoming the platform of choice for large-scale distributed data-intensive...
Design and implementation of an application-layer data throughput prediction and optimization servic...
Parallel TCP flows are broadly used in the high performance distributed computing community to enha...
Abstract—TCP throughput prediction is an important capability for networks where multiple paths exis...
In this paper, we present the Stork Data Scheduler as a solution for mitigating the data bottleneck ...
International audienceIn distributed system the knowledge of the network is mandatory to know the av...
Parallel and distributed systems are pervasive, such as web services, clouds, and cyber-physical sys...
Data transferring in scientific workflows gradually attracts more attention due to large amounts of ...
As more aspects of our daily lives are being computerized, ever larger amounts of data are being pro...
International audienceToday's society faces an unprecedented deluge of data that requires processing...
Applications that use parallel TCP streams to increase throughput must multiplex and demultiplex dat...
Parallel TCP, which opens multiple TCP connections over a single direct path, and Multi-Pathing, whi...
Performance prediction is set to play a significant role in supportive middleware that is designed t...
Accurate workload prediction and throughput estimation are keys in efficient proactive power and per...
Current scientific applications have been producing large amounts of data. The processing, handling ...
The computational grid is becoming the platform of choice for large-scale distributed data-intensive...
Design and implementation of an application-layer data throughput prediction and optimization servic...
Parallel TCP flows are broadly used in the high performance distributed computing community to enha...
Abstract—TCP throughput prediction is an important capability for networks where multiple paths exis...
In this paper, we present the Stork Data Scheduler as a solution for mitigating the data bottleneck ...
International audienceIn distributed system the knowledge of the network is mandatory to know the av...
Parallel and distributed systems are pervasive, such as web services, clouds, and cyber-physical sys...
Data transferring in scientific workflows gradually attracts more attention due to large amounts of ...
As more aspects of our daily lives are being computerized, ever larger amounts of data are being pro...
International audienceToday's society faces an unprecedented deluge of data that requires processing...
Applications that use parallel TCP streams to increase throughput must multiplex and demultiplex dat...
Parallel TCP, which opens multiple TCP connections over a single direct path, and Multi-Pathing, whi...
Performance prediction is set to play a significant role in supportive middleware that is designed t...
Accurate workload prediction and throughput estimation are keys in efficient proactive power and per...
Current scientific applications have been producing large amounts of data. The processing, handling ...
The computational grid is becoming the platform of choice for large-scale distributed data-intensive...