Interpreting Situated Dialogue Utterances: an Update Model that Uses Speech, Gaze, and Gesture Information

Casey Kennington, Spyros Kousidis, David Schlangen

In situated dialogue, speakers share time and space. We present a statistical model for understanding natural language that works incrementally (i.e., in real, shared time) and is grounded (i.e., links to entities in the shared space). We describe our model with an example, then establish that our model works well on nonsituated, telephony application-type utterances, show that it is effective in grounding language in a situated environment, and further show that it can make good use of embodied cues such as gaze and pointing in a fully multi-modal setting.

SIGdial 2013

14th Annual SIGdial Meeting on Discourse and Dialogue

Interpreting Situated Dialogue Utterances: an Update Model that Uses Speech, Gaze, and Gesture Information

Search in Audio

Related Recordings

Exploring the effects of gaze and pauses in situated human-robot interaction

Multimodality and Dialogue Act Classification in the RoboHelper Project