Data summarization is an essential mechanism to accelerate analytic algorithms on large data sets. In this work we present a comprehensive data summarization matrix, namely the Gamma matrix, from which we can derive equivalent equations for many analytic algorithms. In this way, iterative algorithms are changed to work in two phases: (1) Incremental and parallel summarization of the data set in one pass; (2) Iteration in main memory exploiting the summarization matrix in many intermediate computations. We show our summarization matrix captures essential statistical properties of the data set and it allows iterative algorithms to work a lot faster in main memory. Specifically, we show our summarization matrix benefits statistical models, inc...
In the big data era, the use of large-scale machine learning methods is becoming ubiquitous in data ...
An extremely common bottleneck encountered in statistical learning algorithms is inversion of huge c...
This is a dissertation in three parts, in each we explore the development and analysis of a parallel...
Aggregations help computing summaries of a data set, which are ubiquitous in various big data analyt...
Data analysis is an essential task for research. Modern large datasets indeed contain a high volume ...
The scalability problem in data mining involves the development of methods for handling large databa...
We are at the beginning of the multicore era. Computers will have increasingly many cores (processor...
Huge data sets containing millions of training examples with a large number of attributes are relati...
While computational modelling gets more complex and more accurate, its calculation costs have been i...
The course offers basics of analyzing data with machine learning and data mining algorithms in order...
Abstract—Shared-memory systems such as regular desktops now possess enough memory to store large dat...
Abstract Recent years have shown the need of an automated process to discover interesting and hidden...
In this paper we study the issue of how to scale machine learning algorithms, that typically are des...
Machine Learning is a research field with substantial relevance for many applications in different a...
Data Anlaytic techniques have enhanced human ability to solve a lot of data related problems. It ha...
In the big data era, the use of large-scale machine learning methods is becoming ubiquitous in data ...
An extremely common bottleneck encountered in statistical learning algorithms is inversion of huge c...
This is a dissertation in three parts, in each we explore the development and analysis of a parallel...
Aggregations help computing summaries of a data set, which are ubiquitous in various big data analyt...
Data analysis is an essential task for research. Modern large datasets indeed contain a high volume ...
The scalability problem in data mining involves the development of methods for handling large databa...
We are at the beginning of the multicore era. Computers will have increasingly many cores (processor...
Huge data sets containing millions of training examples with a large number of attributes are relati...
While computational modelling gets more complex and more accurate, its calculation costs have been i...
The course offers basics of analyzing data with machine learning and data mining algorithms in order...
Abstract—Shared-memory systems such as regular desktops now possess enough memory to store large dat...
Abstract Recent years have shown the need of an automated process to discover interesting and hidden...
In this paper we study the issue of how to scale machine learning algorithms, that typically are des...
Machine Learning is a research field with substantial relevance for many applications in different a...
Data Anlaytic techniques have enhanced human ability to solve a lot of data related problems. It ha...
In the big data era, the use of large-scale machine learning methods is becoming ubiquitous in data ...
An extremely common bottleneck encountered in statistical learning algorithms is inversion of huge c...
This is a dissertation in three parts, in each we explore the development and analysis of a parallel...