An important step in data analysis is class assignment which isusually done on the basis of a macroscopic phenotypic or bioprocesscharacteristic, such as high vs low growth, healthy vs diseased state,or high vs low productivity. Unfortunately, such an assignment maylump together samples, which when derived from a more detailedphenotypic or bioprocess description are dissimilar, giving rise tomodels of lower quality and predictive power. In this paper we pre-sent a clustering algorithm for data preprocessing which involves theidentification of fundamentally similar lots on the basis of the extentof similarity among the system variables. The algorithm combinesaspects of cluster analysis and principal component analysis byapplying agglomerativ...