InterSpeech 2021

A Neural-Network-Based Approach to Identifying Speakers in Novels
(3 minutes introduction)

Yue Chen (USTC, China), Zhen-Hua Ling (USTC, China), Qing-Feng Liu (USTC, China)
Identifying speakers in novels aims at determining who says a quote in a given context by text analysis. This task is important for speech synthesis systems to assign appropriate voices to the quotes when producing audiobooks. However, existing approaches stick with manual features and traditional machine learning classifiers, which constrain the accuracy of speaker identification. In this paper, we propose a method to tackle this challenging problem with the help of deep learning. We formulate speaker identification as a scoring task and build a candidate scoring network (CSN) based on BERT. Candidate-specific segments are put forward to eliminate redundant context information. Moreover, a revision algorithm is designed utilizing the speaker alternation pattern in two-party dialogues. Experiments have been conducted using the dataset built on the Chinese novel World of Plainness. The results show that our proposed method reaches a new state-of-the-art performance with an identification accuracy of 82.5%, which outperforms the baseline using manual features by 12%.