Improved Deep Bi-directional Transformer Keyword Extraction based on Semantic Understanding of News

Rui Cheng, Haijun Zhang
{"title":"Improved Deep Bi-directional Transformer Keyword Extraction based on Semantic Understanding of News","authors":"Rui Cheng, Haijun Zhang","doi":"10.1109/DSA56465.2022.00110","DOIUrl":null,"url":null,"abstract":"To address the problems of existing methods such as neglecting semantic information and the lack of diversity in extracted keywords. This paper proposes an improved deep bi-directional transformer model based on semantic understanding of news, combining pre-trained word vectors with K-Means algorithm. After extracting word vectors with rich semantic information based on contextual words through the bert pre-training model, then the K-Means clustering algorithm is used to form clusters of different topics. The extracted keywords semantically highlight the central theme and at the same time can better solve the problem that lack of the diversity of keywords. Experiments show that the improved deep bi-directional transformer model based on news language understanding proposed in this paper has significantly improved in accuracy, recall and F-value compared with RAKE, TF-IDF, LDA, RNN, LSTM models for extracting keywords and word2vec models for static extraction of word vectors.","PeriodicalId":208148,"journal":{"name":"2022 9th International Conference on Dependable Systems and Their Applications (DSA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th International Conference on Dependable Systems and Their Applications (DSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSA56465.2022.00110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

To address the problems of existing methods such as neglecting semantic information and the lack of diversity in extracted keywords. This paper proposes an improved deep bi-directional transformer model based on semantic understanding of news, combining pre-trained word vectors with K-Means algorithm. After extracting word vectors with rich semantic information based on contextual words through the bert pre-training model, then the K-Means clustering algorithm is used to form clusters of different topics. The extracted keywords semantically highlight the central theme and at the same time can better solve the problem that lack of the diversity of keywords. Experiments show that the improved deep bi-directional transformer model based on news language understanding proposed in this paper has significantly improved in accuracy, recall and F-value compared with RAKE, TF-IDF, LDA, RNN, LSTM models for extracting keywords and word2vec models for static extraction of word vectors.
基于新闻语义理解的改进深度双向互感器关键字提取
针对现有方法忽略语义信息、提取关键词缺乏多样性等问题。本文提出了一种基于新闻语义理解的深度双向转换模型,将预训练的词向量与K-Means算法相结合。通过bert预训练模型从上下文词中提取语义信息丰富的词向量后,利用K-Means聚类算法形成不同主题的聚类。提取的关键词在语义上突出中心主题的同时,也能较好地解决关键词缺乏多样性的问题。实验表明,与提取关键词的RAKE、TF-IDF、LDA、RNN、LSTM模型和静态提取词向量的word2vec模型相比,本文提出的基于新闻语言理解的改进深度双向变压器模型在准确率、查全率和f值方面都有显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信