Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support systems. In many application scenarios, it is required to group similar but not necessarily equal values. In this paper we propose a new SQL construct that supports similarity-based group-by (SGB). SGB is not a new clustering algorithm, but rather is a practical and fast similarity grouping query operator that is compatible with other SQL operators and can be combined with them to answer similarity-based queries efficiently. In contrast to expensive clustering algorithms, the proposed similarity group-by operator maintains low execution times while still generating meaningful groupings that address many application needs. The paper presents a ...
abstract: As Big Data becomes more relevant, existing grouping and clustering algorithms will need t...
As database technology is applied to more and more application domains, user queries are becoming in...
The identification and processing of similarities in the data play a key role in multiple applicatio...
Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support s...
The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a...
Decision Support Systems (DSS) are information systems that support decision making processes. In ma...
Decision Support Systems (DSS) are information systems that support decision making processes. In ma...
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ap...
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ap...
2 Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ...
Today, a myriad of data sources, from the Internet to business operations to scientific instruments,...
Similarity joins have been studied as key operations in multiple application domains, e.g., record l...
Similarity joins have been studied as key operations in multiple application domains, e.g., record l...
Abstract Many application scenarios can significantly benefit from the identification and processing...
Identifying similarities in large datasets is an essential operation in many applications such as bi...
abstract: As Big Data becomes more relevant, existing grouping and clustering algorithms will need t...
As database technology is applied to more and more application domains, user queries are becoming in...
The identification and processing of similarities in the data play a key role in multiple applicatio...
Group-by is a core database operation that is used extensively in OLTP, OLAP, and decision support s...
The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a...
Decision Support Systems (DSS) are information systems that support decision making processes. In ma...
Decision Support Systems (DSS) are information systems that support decision making processes. In ma...
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ap...
Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ap...
2 Many application scenarios, e.g., marketing analysis, sensor networks, and medical and biological ...
Today, a myriad of data sources, from the Internet to business operations to scientific instruments,...
Similarity joins have been studied as key operations in multiple application domains, e.g., record l...
Similarity joins have been studied as key operations in multiple application domains, e.g., record l...
Abstract Many application scenarios can significantly benefit from the identification and processing...
Identifying similarities in large datasets is an essential operation in many applications such as bi...
abstract: As Big Data becomes more relevant, existing grouping and clustering algorithms will need t...
As database technology is applied to more and more application domains, user queries are becoming in...
The identification and processing of similarities in the data play a key role in multiple applicatio...