Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to {optimally} individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been { largely} overlooked in the current literature. In this article, we show that resid...