For a storage system to keep pace with increasing amounts of data, a natural solution is to deploy more servers to expand storage capacity and mitigate server bottleneck. Due to the large quantity, these servers need to be placed at geographically distributed locations, causing inevitable communication costs. Subsequently, an important design problem is how to best partition the data across the servers. To minimize cross-server traffic, the mainstream approach is data-centric, where data with similar content are assigned to the same server. It is however difficult to effectively quantify content similarity in cases where the content has many attributes or belongs to incomparable categories. In contrast, this dissertation advocates a query-c...
[[abstract]]Mermaid is a testbed system which provides integrated access to multiple databases. Two ...
This paper considers a multi-query optimization issue for distributed similarity query processing, w...
A common approach to processing large RDF datasets is to partition the data in a cluster of shared-n...
[[abstract]]A partition-and-replicate strategy for processing distributed queries referencing no fra...
The growing popularity of Resource Description Framework (RDF) as a mode for data exchange and integ...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
[[abstract]]A partition-and-replicate strategy for processing distributed queries referencing no fra...
Distributed database technology is expected to have a significant impact on data processing in the u...
Abstract. Web-scale RDF datasets are increasingly processed using dis-tributed RDF data stores built...
We consider the problem of query optimization in distributed data stream systems where multiple cont...
In a multiple disk environment it is desirable to have techniques for efficient parallel execution o...
Abstract. Web-scale RDF datasets are increasingly processed using dis-tributed RDF data stores built...
To simplify data integration and exchange, modern applications often represent their data using the...
Web-scale RDF datasets are increasingly processed using distributed RDF data stores built on top of ...
Distributed data processing is becoming a reality. Businesses want to do it for many reasons, and th...
[[abstract]]Mermaid is a testbed system which provides integrated access to multiple databases. Two ...
This paper considers a multi-query optimization issue for distributed similarity query processing, w...
A common approach to processing large RDF datasets is to partition the data in a cluster of shared-n...
[[abstract]]A partition-and-replicate strategy for processing distributed queries referencing no fra...
The growing popularity of Resource Description Framework (RDF) as a mode for data exchange and integ...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
[[abstract]]A partition-and-replicate strategy for processing distributed queries referencing no fra...
Distributed database technology is expected to have a significant impact on data processing in the u...
Abstract. Web-scale RDF datasets are increasingly processed using dis-tributed RDF data stores built...
We consider the problem of query optimization in distributed data stream systems where multiple cont...
In a multiple disk environment it is desirable to have techniques for efficient parallel execution o...
Abstract. Web-scale RDF datasets are increasingly processed using dis-tributed RDF data stores built...
To simplify data integration and exchange, modern applications often represent their data using the...
Web-scale RDF datasets are increasingly processed using distributed RDF data stores built on top of ...
Distributed data processing is becoming a reality. Businesses want to do it for many reasons, and th...
[[abstract]]Mermaid is a testbed system which provides integrated access to multiple databases. Two ...
This paper considers a multi-query optimization issue for distributed similarity query processing, w...
A common approach to processing large RDF datasets is to partition the data in a cluster of shared-n...