Funny words detection via Contrastive Representations and Pre-trained Language Model

Yiming Du, Zelin Tian
{"title":"Funny words detection via Contrastive Representations and Pre-trained Language Model","authors":"Yiming Du, Zelin Tian","doi":"10.1109/AINIT54228.2021.00078","DOIUrl":null,"url":null,"abstract":"Funniness detection of news headlines is a challenging task in computational linguistics. However, most existing works on funniness detection mainly tackle the scenario by simply judging whether a sentence is humorous, whose result is unstable due to factors such as sentence length. To solve this issue, in this paper, our idea is to fine-grained mine the detailed information of the words and the contextual relationship between different words in the sentence, which help to evaluate the correlation between keywords and the funniness of news headlines quantitatively. Specifically, we propose a funny words detection algorithm based on the contrastive representations learning and BERT model. To quantify the impact of different words on the degree of humor, we first subtract the funniness grades of the original news headlines and the funniness grades of the original news headlines with a single word replaced. Both funniness grades are predicted with a pre-trained model, which is supervised by a a threshold to limit the amount of data and ensure the validity of data. To ensure the accuracy of our prediction, we further introduce the contrastive learning to constrain the differences of news headlines before and after word replacement. Finally, according to the Root Mean Square Error (RMSE) matrix in our experiment, we develop a BERT model with mixed sequence embedding to generate a table about words and their corresponding funniness improvement about the news headlines.","PeriodicalId":326400,"journal":{"name":"2021 2nd International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINIT54228.2021.00078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Funniness detection of news headlines is a challenging task in computational linguistics. However, most existing works on funniness detection mainly tackle the scenario by simply judging whether a sentence is humorous, whose result is unstable due to factors such as sentence length. To solve this issue, in this paper, our idea is to fine-grained mine the detailed information of the words and the contextual relationship between different words in the sentence, which help to evaluate the correlation between keywords and the funniness of news headlines quantitatively. Specifically, we propose a funny words detection algorithm based on the contrastive representations learning and BERT model. To quantify the impact of different words on the degree of humor, we first subtract the funniness grades of the original news headlines and the funniness grades of the original news headlines with a single word replaced. Both funniness grades are predicted with a pre-trained model, which is supervised by a a threshold to limit the amount of data and ensure the validity of data. To ensure the accuracy of our prediction, we further introduce the contrastive learning to constrain the differences of news headlines before and after word replacement. Finally, according to the Root Mean Square Error (RMSE) matrix in our experiment, we develop a BERT model with mixed sequence embedding to generate a table about words and their corresponding funniness improvement about the news headlines.
基于对比表征和预训练语言模型的搞笑词检测
新闻标题的滑稽性检测是计算语言学领域的一个具有挑战性的课题。然而,大多数现有的滑稽检测工作主要是通过简单地判断句子是否幽默来解决这个问题,由于句子长度等因素,其结果是不稳定的。为了解决这一问题,在本文中,我们的想法是细粒度挖掘单词的详细信息以及句子中不同单词之间的语境关系,这有助于定量评估关键词与新闻标题的滑稽性之间的相关性。具体来说,我们提出了一种基于对比表征学习和BERT模型的搞笑词检测算法。为了量化不同单词对幽默程度的影响,我们首先将原新闻标题的搞笑等级和替换单个单词的原新闻标题的搞笑等级相减。这两种幽默等级都是用预训练的模型来预测的,该模型由一个阈值来监督,以限制数据量并确保数据的有效性。为了保证预测的准确性,我们进一步引入对比学习来约束换词前后新闻标题的差异。最后,根据实验中的均方根误差(RMSE)矩阵,我们开发了一个混合序列嵌入的BERT模型,生成了一个关于新闻标题的单词及其相应的有趣度改进表。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信