Since only a small number of traces generated from distributed tracing helps in troubleshooting, its storage requirement can be significantly reduced by biasing the selection towards anomalous traces. To aid in this scenario, we propose SampleHST, a novel approach to sample on-the-fly from a stream of traces in an unsupervised manner. SampleHST adjusts the storage quota of normal and anomalous traces depending on the size of its budget. Initially, it utilizes a forest of Half Space Trees (HSTs) for trace scoring. This is based on the distribution of the mass scores across the trees, which characterizes the probability of observing different traces. The mass distribution from HSTs is subsequently used to cluster the traces online leveraging ...
Abstract –We consider estimation of arbitrary range partitioning of data values and ranking of frequ...
In this paper, we aim to enable both efficient and accurate approximations on arbitrary sub-datasets...
International audienceMany applications generate data streams where online analysis needs are essent...
Since only a small number of traces generated from distributed tracing helps in troubleshooting, its...
Part 4: Big Data+CloudInternational audienceExisting distributed tracing tools such as HTrace use st...
Heterogeneous mobile, sensor, IoT, smart environment, and social networking applications have recent...
We propose a sampling infrastructure for gathering information about software from the set of runs e...
Most existing sampling algorithms on graphs (i.e., network-structured data) focus on sampling from m...
One of the most urgent challenges in event based performance analysis is the enormous amount of coll...
With the increasing deployment of heterogeneous memory architectures, the efficient execution of a w...
Abstract Tracing mechanisms in distributed systems give important insight into system properties and...
One response to the proliferation of large datasets has been to develop ingenious ways to throw reso...
Clustering methods are machine-learning algorithms that can be used to easily select the most repres...
Abstract—Cluster filtering is a kind of test selection technique, which saves human efforts for resu...
We introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fix...
Abstract –We consider estimation of arbitrary range partitioning of data values and ranking of frequ...
In this paper, we aim to enable both efficient and accurate approximations on arbitrary sub-datasets...
International audienceMany applications generate data streams where online analysis needs are essent...
Since only a small number of traces generated from distributed tracing helps in troubleshooting, its...
Part 4: Big Data+CloudInternational audienceExisting distributed tracing tools such as HTrace use st...
Heterogeneous mobile, sensor, IoT, smart environment, and social networking applications have recent...
We propose a sampling infrastructure for gathering information about software from the set of runs e...
Most existing sampling algorithms on graphs (i.e., network-structured data) focus on sampling from m...
One of the most urgent challenges in event based performance analysis is the enormous amount of coll...
With the increasing deployment of heterogeneous memory architectures, the efficient execution of a w...
Abstract Tracing mechanisms in distributed systems give important insight into system properties and...
One response to the proliferation of large datasets has been to develop ingenious ways to throw reso...
Clustering methods are machine-learning algorithms that can be used to easily select the most repres...
Abstract—Cluster filtering is a kind of test selection technique, which saves human efforts for resu...
We introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fix...
Abstract –We consider estimation of arbitrary range partitioning of data values and ranking of frequ...
In this paper, we aim to enable both efficient and accurate approximations on arbitrary sub-datasets...
International audienceMany applications generate data streams where online analysis needs are essent...