Robust Language Recognition Based on Diverse Features

Qian Zhang, Gang Liu and John Hansen

In real scenarios, robust language identification (LID) is usually hindered by factors such as background noise, channel, and speech duration mismatches. To address these issues, this study focuses on the advancements of diverse acoustic features, back-ends, and their influence on LID system fusion. There is little research about the selection of complementary features for a multiple system fusion in LID. A set of distinct features are considered, which can be grouped into three categories: classical features, innovative features, and extensional features. In addition, both front-end concatenation and back-end fusion are considered. The results suggest that no single feature type is universally vital across all LID tasks and that a fusion of a diverse set is needed to ensure sustained LID performance in challenging scenarios. Moreover, the back-end fusion also consistently enhances the system performance significantly. More specifically, the proposed hybrid fusion method improves system performance by +38.5% and +46.2% on the DARPA RATS and the NIST LRE09 data sets, respectively.

Loading player

Odyssey 2014

The Speaker and Language Recognition Workshop

Robust Language Recognition Based on Diverse Features

Search in Audio

Speech Transcript

Related Recordings

NIST Language Recognition Evaluation – Past and Future

Speaker-basis Accent Clustering Using Invariant Structure Analysis and the Speech Accent Archive