Thesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.Cataloged from PDF version of thesis.Includes bibliographical references (pages 160-174).In this thesis, we develop a family of real-time data reduction algorithms for large data streams, by computing a compact and meaningful representation of the data called a coreset. This representation can then be used to enable efficient analysis such as segmentation, summarization, classification, and prediction. Our proposed algorithms support large streams and datasets that axe too large to store in memory, allow easy parallelization, and generalize to different data types and analyses. We discuss...
Big-data is the expression used to describe large data sets, which are complex and require analysis ...
As the world is becoming more digital, an increasing amount of data is generated that could provide ...
Development in hardware, cloud computing and dissemination of the Internet during last decade gave ...
Life-logging video streams, financial time series, and Twitter tweets are a few examples of high-dim...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Given an image stream, our on-line algorithm will select the semantically-important images that summ...
Organizing data into groups using unsupervised learning algorithms such as k-means clustering and GM...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
In the last decade, real-time data processing has attracted much attention from both academic commun...
The wide availability of networked sensors such as GPS and cameras is enabling the creation sensor n...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
Two currently popular topics in computer science are machine learning and big data. Often the two ar...
This thesis studies clustering problems on data streams, specifically with applications to metric sp...
Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extracting meaningf...
University of Technology Sydney. Faculty of Engineering and Information Technology.Machine learning ...
Big-data is the expression used to describe large data sets, which are complex and require analysis ...
As the world is becoming more digital, an increasing amount of data is generated that could provide ...
Development in hardware, cloud computing and dissemination of the Internet during last decade gave ...
Life-logging video streams, financial time series, and Twitter tweets are a few examples of high-dim...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Given an image stream, our on-line algorithm will select the semantically-important images that summ...
Organizing data into groups using unsupervised learning algorithms such as k-means clustering and GM...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
In the last decade, real-time data processing has attracted much attention from both academic commun...
The wide availability of networked sensors such as GPS and cameras is enabling the creation sensor n...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
Two currently popular topics in computer science are machine learning and big data. Often the two ar...
This thesis studies clustering problems on data streams, specifically with applications to metric sp...
Massive high-dimensional data sets are ubiquitous in all scientific disciplines. Extracting meaningf...
University of Technology Sydney. Faculty of Engineering and Information Technology.Machine learning ...
Big-data is the expression used to describe large data sets, which are complex and require analysis ...
As the world is becoming more digital, an increasing amount of data is generated that could provide ...
Development in hardware, cloud computing and dissemination of the Internet during last decade gave ...