Tools commonly leveraged to tackle large-scale data science workflows have traditionally shied away from existing high performance computing paradigms, largely due to their lack of fault tolerance and computation resiliency. However, these concerns are typically only of critical importance to problems tackled by technology companies at the highest level. For the average data scientist, the benefits of resiliency may not be as important as the overall execution performance. To this end, the work of this thesis aims to develop prototypes of tools favored by the data science community that function in a data-parallel environment, taking advantage of functionality commonly used in high performance computing. To achieve this goal, a prototype-di...
Modern open source high-level languages such as R and Python are.increasingly playing an important r...
With diminishing gains in processing power from successive generations of hardware development, ther...
The processing of massive amounts of data on clusters with finite amount of memory has become an imp...
In this paper, we introduce DistNumPy, a library for doing numeri-cal computation in Python that tar...
High performance computing becomes more important in many areas by provide fast, reliable and cost...
This work presents two software components aimed to relieve the costs of accessing high-performance ...
The Python programming language has gradually gained popularity in the field of scientific computing...
This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows...
In this paper, we introduce a model for managing abstract data structures that map to arbitrary dist...
Python has been adopted as programming language by a large number of scientific communities. Additio...
Despite advancements in the areas of parallel and distributed computing, the complexity of programmi...
The use of the Python programming language for scientific computing has been gaining momentum in the...
©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for al...
MPI4Py provides open source Python bindings to most of the functionality of MPI-1/2/3 specifications...
The two most popular Computer Programming languages for Data Science are Python, and R. Both are dyn...
Modern open source high-level languages such as R and Python are.increasingly playing an important r...
With diminishing gains in processing power from successive generations of hardware development, ther...
The processing of massive amounts of data on clusters with finite amount of memory has become an imp...
In this paper, we introduce DistNumPy, a library for doing numeri-cal computation in Python that tar...
High performance computing becomes more important in many areas by provide fast, reliable and cost...
This work presents two software components aimed to relieve the costs of accessing high-performance ...
The Python programming language has gradually gained popularity in the field of scientific computing...
This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows...
In this paper, we introduce a model for managing abstract data structures that map to arbitrary dist...
Python has been adopted as programming language by a large number of scientific communities. Additio...
Despite advancements in the areas of parallel and distributed computing, the complexity of programmi...
The use of the Python programming language for scientific computing has been gaining momentum in the...
©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for al...
MPI4Py provides open source Python bindings to most of the functionality of MPI-1/2/3 specifications...
The two most popular Computer Programming languages for Data Science are Python, and R. Both are dyn...
Modern open source high-level languages such as R and Python are.increasingly playing an important r...
With diminishing gains in processing power from successive generations of hardware development, ther...
The processing of massive amounts of data on clusters with finite amount of memory has become an imp...