Abstract—Many scientific applications nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future. Enabling fast extraction of knowledge through analyzing these large datasets holds the key to faster scientific discoveries. However, reading data from traditional storage subsystem is a slow process as the I/O performance lags far behind computational performance. Reducing data movement from the storage subsystem is widely considered a viable option for improving performance of data analysis. In this paper, we propose Segmented Analysis, a data movement reduction strategy through reusing results, where multiple similar analysis tasks process the same segments of dat...
Data mining is the process of extracting useful information or patterns from large raw sets of data....
This electronic version was submitted by the student author. The certified thesis is available in th...
In this paper we explore database segmentation in the context of a column-store DBMS targeted at a s...
AbstractIn the converging world of High Performance Computing and Big Data, moving data is becoming ...
Scientific and data-intensive applications often require exploratory analysis on large datasets, whi...
According to a recent exascale roadmap report, analysis will be the limiting factor in gaining insig...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Thanks to the advancement of the modern computer simulation systems, many scientific applications ge...
Increasingly larger scale applications are generating an unprecedented amount of data. However, the ...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
Thanks to its RDataFrame interface, ROOT now supports the execution of the same physics analysis cod...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
© 2018 Elsevier B.V. We present a technique to automatically minimise the re-computation when a data...
Supercomputing advances have enabled computational science data volumes to grow at ever increasing r...
Data mining is the process of extracting useful information or patterns from large raw sets of data....
This electronic version was submitted by the student author. The certified thesis is available in th...
In this paper we explore database segmentation in the context of a column-store DBMS targeted at a s...
AbstractIn the converging world of High Performance Computing and Big Data, moving data is becoming ...
Scientific and data-intensive applications often require exploratory analysis on large datasets, whi...
According to a recent exascale roadmap report, analysis will be the limiting factor in gaining insig...
There is an ever widening performance gap between processors and main memory, a gap bridged by small...
The potential for improving the performance of data-intensive scientific programs by enhancing data ...
Thanks to the advancement of the modern computer simulation systems, many scientific applications ge...
Increasingly larger scale applications are generating an unprecedented amount of data. However, the ...
The growing gap between processor clock speed and DRAM access time puts new demands on software and ...
Thanks to its RDataFrame interface, ROOT now supports the execution of the same physics analysis cod...
Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak process...
© 2018 Elsevier B.V. We present a technique to automatically minimise the re-computation when a data...
Supercomputing advances have enabled computational science data volumes to grow at ever increasing r...
Data mining is the process of extracting useful information or patterns from large raw sets of data....
This electronic version was submitted by the student author. The certified thesis is available in th...
In this paper we explore database segmentation in the context of a column-store DBMS targeted at a s...