Embodied Question Answering

Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

We present a new AI task – Embodied Question Answering (EmbodiedQA) – where an agent is spawned at a random location in a 3D environment and asked a question (‘What color is the car?’). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and answer the question (‘orange’). EmbodiedQA requires a range of AI skills – language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, longterm memory, and grounding language into actions. In this work, we develop a dataset of questions and answers in House3D environments (Wu et al., 2018), evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning for this task.

Switch Camera

SIGdial 2018

19th Annual SIGdial Meeting on Discourse and Dialogue

Embodied Question Answering

Search in Audio

Speech Transcript

Related Recordings

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog