The past few years have seen a major change in computing systems, as growing data volumes and stalling processor speeds require more and more applications to scale out to distributed systems. Today, a myriad data sources, from the Internet to business operations to scientific instruments, produce large and valuable data streams. However, the processing capabilities of single machines have not kept up with the size of data, making it harder and harder to put to use. As a result, a growing number of organizations---not just web companies, but traditional enterprises and research labs---need to scale out their most important computations to clusters of hundreds of machines. At the same time, the speed and sophistication required of data pro...