Reinforcement Learning in Multi-Party Trading Dialog

Takuya Hiraoka, Kallirroi Georgila, Elnaz Nouri, David Traum and Satoshi Nakamura

In this paper, we apply reinforcementlearning (RL) to a multi-party trading scenario where the dialog system (learner)trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. The negotiation strategy of the learner is learnedthrough simulated dialog with trader simulators. In our experiments, we evaluatehow the performance of the learner variesdepending on the RL algorithm used andthe number of traders. Our results showthat (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is a very hard problem; and (2) the use of neural fitted Q iteration combined with an incremental rewardfunction produces negotiation policies aseffective or even better than the policies oftwo strong hand-crafted baselines.

SIGdial 2015

16th Annual SIGdial Meeting on Discourse and Dialogue

Reinforcement Learning in Multi-Party Trading Dialog

Search in Audio

Related Recordings

An Incremental Turn-Taking Model with Active System Barge-in for Spoken Dialog Systems