In this paper, we study a flexible framework for semantic analysis of human motion from a monocular surveillance video. Successful trajectory estimation and human-body modeling facilitate the semantic analysis of human activities in video sequences. As a first contribution, we propose a flexible framework that enables automatic analysis of human behavior and semantic events. It can be utilized in surveillance applications with four-level analysis results. The second contribution is the introduction of a 3-D reconstruction scheme for scene understanding. The total framework consists of four processing levels: (1) a pre-processing level including background modeling and multiple-person detection, (2) an object-based level performing trajector...