Most of the parallelism associated with scientific/numeric applications exists in the form of loops, and thus transforming loops has been extensively studied in the past, especially in the areas of programming languages and compiler designs. Almost all the existing transformation approaches are control-centric, in which the transformation process starts from partitioning the iteration space, followed by the decomposition of the data space only as a side-effect. Originally designed for shared-memory multi-processors, these control-centric approaches might not be suitable under some circumstances for current loosely-coupled clusters and the Grid with physically distributed memories. In this paper, we introduce a novel data-centric and design-...