In this paper, we present a software clustering approach that leverages the information conveyed by the zone in which each lexeme appears in the classes of object oriented systems. We define six zones in the source code: Class Name, Attribute Name, Method Name, Parameter Name, Comment, and Source Code Statement. These zones may convey information with different levels of relevance, and so their contribution should be differently weighed according to the software system under study. To this aim, we define a probabilistic model of the lexemes distribution whose parameters are automatically estimated by the Expectation-Maximization algorithm. The weights of the zones are then exploited to compute similarities among source code classes, which a...