Abstract: Far from being linguistic anomalies, multi-word expressions abound in natural language, yet their identification is surprisingly problematic. The same combination of words can occur as a compositional, fully lexical string or as a delexicalised multi-word unit (MWU). How can these different manifestations of a series of words be distinguished one from the other? To exacerbate the problem, the creativity of language users results in the appearance of non-canonical forms of MWUs. How can these innovative uses be retrieved so that they can be incorporated into a comprehensive analysis of the MWU under study? This paper sets forth procedures for retrieving non-canonical variants from large general reference corpora, and addresses the ...