Multimodal interfaces (MMIs) are a promising human-computer interaction paradigm. They are feasible for a wide rang of environments, yet they are especially suited if interactions are spatially and temporally grounded with an environment in which the user is (physically) situated. Real-time interactive systems (RISs) are technical realizations for situated interaction environments, originating from application areas like virtual reality, mixed reality, human-robot interaction, and computer games. RISs include various dedicated processing-, simulation-, and rendering subsystems which collectively maintain a real-time simulation of a coherent application state. They thus fulfil the complex functional requirements of their application area...