In this work we describe and optimize a general scheme based on lifting transforms on graphs for video coding. A graph is constructed to represent the video signal. Each pixel becomes a node in the graph and links between nodes represent similarity between them. Therefore, spatial neighbors and temporal motion-related pixels can be linked, while nonsimilar pixels (e.g., pixels across an edge) may not be. Then, a lifting-based transform, in which filterin operations are performed using linked nodes, is applied to this graph, leading to a 3-dimensional (spatio-temporal) directional transform which can be viewed as an extension of wavelet transforms for video. The design of the proposed scheme requires four main steps: (i) graph construction, ...