|Inigo Casanueva, Thomas Hain, Heidi Christensen, Ricard Marxer and Phil Green|
Model-free reinforcement learning has been shown to be a promising data driven approach for automatic dialogue policy optimization, but a relatively large amount of dialogue interactions is needed before the system reaches reasonable performance. Recently, Gaussian process based reinforcement learning methods have been shown to reduce the number of dialogues needed to reach optimal performance, and pre-training the policy with data gathered from different dialogue systems has further reduced this amount. Following this idea, a dialogue system designed for a single speaker can be initialised with data from other speakers, but if the dynamics of the speakers are very different the model will have a poor performance. When data gathered from different speakers is available, selecting the data from the most similar ones might improve the performance. We propose a method which automatically selects the data to transfer by defining a similarity measure between speakers, and uses this measure to weight the influence of the data from each speaker in the policy model. The methods are tested by simulating users with different severities of dysarthria interacting with a voice enabled environmental control system.