Speaker models for blind source separation are typically based on HMMs consisting of vast numbers of states to capture source spectral variation, and trained on large amounts of isolated speech. Since observations can be similar between sources, inference relies on sequential constraints from the state transition matrix which are, however, quite weak. To avoid these problems, we propose a strategy of capturing local deformations of the time-frequency energy distribution. Since consecutive spectral frames are highly correlated, each frame can be accurately described as a nonuniform deformation of its predecessor. A smooth pattern of deformations is indicative of a single speaker, and the cliffs in the deformation fields may indicate a speake...