|Takuya Hiraoka, Kallirroi Georgila, Elnaz Nouri, David Traum and Satoshi Nakamura|
In this paper, we apply reinforcementlearning (RL) to a multi-party trading scenario where the dialog system (learner)trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. The negotiation strategy of the learner is learnedthrough simulated dialog with trader simulators. In our experiments, we evaluatehow the performance of the learner variesdepending on the RL algorithm used andthe number of traders. Our results showthat (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is a very hard problem; and (2) the use of neural fitted Q iteration combined with an incremental rewardfunction produces negotiation policies aseffective or even better than the policies oftwo strong hand-crafted baselines.