Injecting Descriptive Meta-information into Pre-trained Language Models with Hypernetworks <BR>(3 minutes introduction)

Injecting Descriptive Meta-information into Pre-trained Language Models with Hypernetworks
(3 minutes introduction)

Wenying Duan (Nanchang University, China), Xiaoxi He (ETH Zürich, Switzerland), Zimu Zhou (Singapore Management University, Singapore), Hong Rao (Nanchang University, China), Lothar Thiele (ETH Zürich, Switzerland)

Pre-trained language models have been widely adopted as backbones in various natural language processing tasks. However, existing pre-trained language models ignore the descriptive meta-information in the text such as the distinction between the title and the mainbody, leading to over-weighted attention to insignificant text. In this paper, we propose a hypernetwork-based architecture to model the descriptive meta-information and integrate it into pre-trained language models. Evaluations on three natural language processing tasks show that our method notably improves the performance of pre-trained language models and achieves the state-of-the-art results on keyphrase extraction.

InterSpeech 2021

Injecting Descriptive Meta-information into Pre-trained Language Models with Hypernetworks
(3 minutes introduction)

Search in Audio

Related Recordings

Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering
(3 minutes introduction)

Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy
(3 minutes introduction)

InterSpeech 2021

Injecting Descriptive Meta-information into Pre-trained Language Models with Hypernetworks (3 minutes introduction)

Search in Audio

Related Recordings

Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering (3 minutes introduction)

Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy (3 minutes introduction)

Injecting Descriptive Meta-information into Pre-trained Language Models with Hypernetworks
(3 minutes introduction)

Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering
(3 minutes introduction)

Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy
(3 minutes introduction)