|Qian Zhang, Gang Liu and John Hansen|
In real scenarios, robust language identification (LID) is usually hindered by factors such as background noise, channel, and speech duration mismatches. To address these issues, this study focuses on the advancements of diverse acoustic features, back-ends, and their influence on LID system fusion. There is little research about the selection of complementary features for a multiple system fusion in LID. A set of distinct features are considered, which can be grouped into three categories: classical features, innovative features, and extensional features. In addition, both front-end concatenation and back-end fusion are considered. The results suggest that no single feature type is universally vital across all LID tasks and that a fusion of a diverse set is needed to ensure sustained LID performance in challenging scenarios. Moreover, the back-end fusion also consistently enhances the system performance significantly. More specifically, the proposed hybrid fusion method improves system performance by +38.5% and +46.2% on the DARPA RATS and the NIST LRE09 data sets, respectively.