Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks

Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, G. Xie
{"title":"Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks","authors":"Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, G. Xie","doi":"10.18653/v1/D19-6011","DOIUrl":null,"url":null,"abstract":"To solve the shared tasks of COIN: COmmonsense INference in Natural Language Processing) Workshop in , we need explore the impact of knowledge representation in modeling commonsense knowledge to boost performance of machine reading comprehension beyond simple text matching. There are two approaches to represent knowledge in the low-dimensional space. The first is to leverage large-scale unsupervised text corpus to train fixed or contextual language representations. The second approach is to explicitly express knowledge into a knowledge graph (KG), and then fit a model to represent the facts in the KG. We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention. We find out that: (a) for task 1, first fine-tuning on larger datasets like RACE (Lai et al., 2017) and SWAG (Zellersetal.,2018), and then fine-tuning on the target task improve the performance significantly; (b) for task 2, we find out the incorporating a KG of commonsense knowledge, WordNet (Miller, 1995) into the Bert model (Devlin et al., 2018) is helpful, however, it will hurts the performace of XLNET (Yangetal.,2019), a more powerful pre-trained model. Our approaches achieve the state-of-the-art results on both shared task’s official test data, outperforming all the other submissions.","PeriodicalId":192716,"journal":{"name":"Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/D19-6011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

To solve the shared tasks of COIN: COmmonsense INference in Natural Language Processing) Workshop in , we need explore the impact of knowledge representation in modeling commonsense knowledge to boost performance of machine reading comprehension beyond simple text matching. There are two approaches to represent knowledge in the low-dimensional space. The first is to leverage large-scale unsupervised text corpus to train fixed or contextual language representations. The second approach is to explicitly express knowledge into a knowledge graph (KG), and then fit a model to represent the facts in the KG. We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention. We find out that: (a) for task 1, first fine-tuning on larger datasets like RACE (Lai et al., 2017) and SWAG (Zellersetal.,2018), and then fine-tuning on the target task improve the performance significantly; (b) for task 2, we find out the incorporating a KG of commonsense knowledge, WordNet (Miller, 1995) into the Bert model (Devlin et al., 2018) is helpful, however, it will hurts the performace of XLNET (Yangetal.,2019), a more powerful pre-trained model. Our approaches achieve the state-of-the-art results on both shared task’s official test data, outperforming all the other submissions.
平安智能健康与上海交通大学COIN -共享任务:在机器阅读任务中利用预训练的语言模型和常识知识
为了解决自然语言处理中的常识推理(COIN: COmmonsense INference in Natural Language Processing)研讨会的共享任务,我们需要探索知识表示对常识知识建模的影响,以提高机器阅读理解的性能,超越简单的文本匹配。在低维空间中有两种表示知识的方法。第一种是利用大规模的无监督文本语料库来训练固定或上下文语言表示。第二种方法是将知识明确地表达成知识图(knowledge graph, KG),然后拟合一个模型来表示KG中的事实。我们已经进行了两项实验:(a)通过利用类似任务的数据集,在具有小数据集大小的任务上改进预训练语言模型的微调;(b)通过简单的连接或多头注意,将KG的分布表示合并到预训练语言模型的表示上。我们发现:(a)对于任务1,首先在RACE (Lai etal., 2017)和SWAG (Zellersetal.,2018)等较大的数据集上进行微调,然后在目标任务上进行微调,显著提高了性能;(b)对于任务2,我们发现将KG的常识知识WordNet (Miller, 1995)纳入Bert模型(Devlin等人,2018)是有帮助的,但是,它会损害XLNET (Yangetal.,2019)的性能,这是一个更强大的预训练模型。我们的方法在共享任务的官方测试数据上实现了最先进的结果,优于所有其他提交。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信