|Manthan Sharma (Indian Institute of Science, India), Navaneetha Gaddam (Indian Institute of Science, India), Tejas Umesh (Indian Institute of Science, India), Aditya Murthy (Indian Institute of Science, India), Prasanta Kumar Ghosh (Indian Institute of Science, India)|
Electromyography (EMG) signals have been extensively used to capture facial muscle movements while speaking since they are one of the most closely related bio-signals generated during speech production. In this work, we focus on speech acoustics to EMG prediction. We present a comparative study of ten different EMG signal-based features including Time Domain (TD) features existing in the literature to examine their effectiveness in speech acoustics to EMG inverse (AEI) mapping. We propose a novel feature based on the Hilbert envelope of the filtered EMG signal. The raw EMG signal is reconstructed from these features as well. For the AEI mapping, we use a bi-directional long short-term memory (BLSTM) network in a session-dependent manner. To estimate the raw EMG signal from the EMG features, we use a CNN-BLSTM model comprising of a convolution neural network (CNN) followed by BLSTM layers. AEI mapping performance using the BLSTM network reveals that the Hilbert envelope based feature is predicted from speech with the highest accuracy, among all the features. Therefore, it could be the most representative feature of the underlying muscle activation during speech production. The proposed Hilbert envelope feature, when used together with the existing TD features, improves the raw EMG signal reconstruction performance compared to using the TD features alone.