In this article we present a building block technique and a toolkit towards automatic discovery of workload-dependent performance bottlenecks. From one or more runs of a program, our profiler automatically measures how the performance of individual routines scales as a function of the input size, yielding clues to their growth rate. The output of the profiler is, for each executed routine of the program, a set of tuples that aggregate performance costs by input size. The collected profiles can be used to produce performance plots and derive trend functions by statistical curve fitting techniques. A key feature of our method is the ability to automatically measure the size of the input given to a generic code fragment: to this aim, we propos...