Interpreting Situated Dialogue Utterances: an Update Model that Uses Speech, Gaze, and Gesture Information
|Casey Kennington, Spyros Kousidis, David Schlangen|
In situated dialogue, speakers share time and space. We present a statistical model for understanding natural language that works incrementally (i.e., in real, shared time) and is grounded (i.e., links to entities in the shared space). We describe our model with an example, then establish that our model works well on nonsituated, telephony application-type utterances, show that it is effective in grounding language in a situated environment, and further show that it can make good use of embodied cues such as gaze and pointing in a fully multi-modal setting.