ABSTRACT Due to R's popularity as a data-mining tool, many distributed systems expose an R-based API to users who need to build a distributed application in R. As a result, data scientists have to learn to use different interfaces such as RHadoop, SparkR, Revolution R's ScaleR, and HPE's Distributed R. Unfortunately, these interfaces are custom, nonstandard, and difficult to learn. Not surprisingly, R applications written in one framework do not work in another, and each backend infrastructure has spent redundant effort in implementing distributed machine learning algorithms. Working with the members of R-core, we have created ddR (Distributed Data structures in R), a unified system that works across different distributed fra...
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable ...
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominate...
This paper presents two complementary statistical computing frameworks that address challenges in pa...
Implementation of machine learning algorithms in a distributed environment ensures us multiple advan...
Theoretically, many modern statistical procedures are trivial to parallelize. However, practical de...
The data science community today has embraced the concept of Dataframes as the de facto standard for...
The advent of algorithms capable of leveraging vast quantities of data and computational resources h...
The Resilient Distributed Dataset (RDD) is the core memory abstraction behind the popular data-analy...
This paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-mem...
ABSTRACTThe rise of big data has led to new demands for machine learning (ML) systems to learn compl...
The recent cloud computing revolution has changed the distributed computing landscape, making the re...
The rise of big data has led to new demands for machine learning (ML) systems to learn complex model...
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable ...
In the last couple of years, the amount of data to be analyzed in different areas grows rapidly. Exa...
In recent years, growing data volumes and more sophisticated computational procedures have greatly i...
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable ...
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominate...
This paper presents two complementary statistical computing frameworks that address challenges in pa...
Implementation of machine learning algorithms in a distributed environment ensures us multiple advan...
Theoretically, many modern statistical procedures are trivial to parallelize. However, practical de...
The data science community today has embraced the concept of Dataframes as the de facto standard for...
The advent of algorithms capable of leveraging vast quantities of data and computational resources h...
The Resilient Distributed Dataset (RDD) is the core memory abstraction behind the popular data-analy...
This paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-mem...
ABSTRACTThe rise of big data has led to new demands for machine learning (ML) systems to learn compl...
The recent cloud computing revolution has changed the distributed computing landscape, making the re...
The rise of big data has led to new demands for machine learning (ML) systems to learn complex model...
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable ...
In the last couple of years, the amount of data to be analyzed in different areas grows rapidly. Exa...
In recent years, growing data volumes and more sophisticated computational procedures have greatly i...
While many existing formal concept analysis algorithms are efficient, they are typically unsuitable ...
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominate...
This paper presents two complementary statistical computing frameworks that address challenges in pa...