InterSpeech 2021

Multimodal Sentiment Analysis with Temporal Modality Attention
(3 minutes introduction)

Fan Qian (Harbin Institute of Technology, China), Jiqing Han (Harbin Institute of Technology, China)
Multimodal sentiment analysis is an important research that involves integrating information from multiple modalities to identify a speaker underlying attitude. The core challenge is to model cross-modal interactions which span across both the different modalities and time. Although great progress has been made, the existing methods are still not sufficient for modeling cross-modal interactions. Inspired by previous research in cognitive neuroscience that humans perceive intentions through focusing on different modalities over time, in this paper we propose a novel attention mechanism called Temporal Modality Attention (TMA) to simulate this process. Cross-modal interactions are modeled using this human-like TMA mechanism which focuses on specific modalities dynamically as recurrent modeling proceed. To verify the effectiveness of TMA, we conduct comprehensive experiments on multiple benchmark datasets for multimodal sentiment analysis. The results show a consistently significant improvement compared to the baseline models.