Fricative Phoneme Detection Using Deep Neural Networks and its Comparison to Traditional Methods
|Metehan Yurt (Fraunhofer IIS, Germany), Pavan Kantharaju (Fraunhofer IIS, Germany), Sascha Disch (Fraunhofer IIS, Germany), Andreas Niedermeier (Fraunhofer IIS, Germany), Alberto N. Escalante-B. (WS Audiology, Germany), Veniamin I. Morgenshtern (FAU Erlangen-Nürnberg, Germany)|
Accurate phoneme detection and processing can enhance speech intelligibility in hearing aids and audio & speech codecs. As fricative phonemes have an important part of their energy concentrated in high frequency bands, frequency lowering algorithms are used in hearing aids to improve fricative intelligibility for people with high-frequency hearing loss. In traditional audio codecs, while processing speech in blocks, spectral smearing around fricative phoneme borders results in pre and post echo artifacts. Hence, detecting the fricative borders and adapting the processing accordingly could enhance the quality of speech. Until recently, phoneme detection and analysis were mostly done by extracting features specific to the class of phonemes. In this paper, we present a deep learning based fricative phoneme detection algorithm that exceeds the state-of-the-art fricative phoneme detection accuracy on the TIMIT speech corpus. Moreover, we compare our method to other approaches that employ classical signal processing for fricative detection and also evaluate it on the TIMIT files coded with AAC codec followed by bandwidth limitation. Reported results of our deep learning approach on original TIMIT files are reproducible and come with an easy to use code that could serve as a baseline for any future research on this topic.