Injecting Descriptive Meta-information into Pre-trained Language Models with Hypernetworks
(3 minutes introduction)
|Wenying Duan (Nanchang University, China), Xiaoxi He (ETH Zürich, Switzerland), Zimu Zhou (Singapore Management University, Singapore), Hong Rao (Nanchang University, China), Lothar Thiele (ETH Zürich, Switzerland)|
Pre-trained language models have been widely adopted as backbones in various natural language processing tasks. However, existing pre-trained language models ignore the descriptive meta-information in the text such as the distinction between the title and the mainbody, leading to over-weighted attention to insignificant text. In this paper, we propose a hypernetwork-based architecture to model the descriptive meta-information and integrate it into pre-trained language models. Evaluations on three natural language processing tasks show that our method notably improves the performance of pre-trained language models and achieves the state-of-the-art results on keyphrase extraction.