Improved Deep Bi-directional Transformer Keyword Extraction based on Semantic Understanding of News

2022 9th International Conference on Dependable Systems and Their Applications (DSA) Pub Date : 2022-08-01 DOI:10.1109/DSA56465.2022.00110

Rui Cheng, Haijun Zhang

引用次数: 0

Abstract

To address the problems of existing methods such as neglecting semantic information and the lack of diversity in extracted keywords. This paper proposes an improved deep bi-directional transformer model based on semantic understanding of news, combining pre-trained word vectors with K-Means algorithm. After extracting word vectors with rich semantic information based on contextual words through the bert pre-training model, then the K-Means clustering algorithm is used to form clusters of different topics. The extracted keywords semantically highlight the central theme and at the same time can better solve the problem that lack of the diversity of keywords. Experiments show that the improved deep bi-directional transformer model based on news language understanding proposed in this paper has significantly improved in accuracy, recall and F-value compared with RAKE, TF-IDF, LDA, RNN, LSTM models for extracting keywords and word2vec models for static extraction of word vectors.

查看原文本刊更多论文

基于新闻语义理解的改进深度双向互感器关键字提取

针对现有方法忽略语义信息、提取关键词缺乏多样性等问题。本文提出了一种基于新闻语义理解的深度双向转换模型，将预训练的词向量与K-Means算法相结合。通过bert预训练模型从上下文词中提取语义信息丰富的词向量后，利用K-Means聚类算法形成不同主题的聚类。提取的关键词在语义上突出中心主题的同时，也能较好地解决关键词缺乏多样性的问题。实验表明，与提取关键词的RAKE、TF-IDF、LDA、RNN、LSTM模型和静态提取词向量的word2vec模型相比，本文提出的基于新闻语言理解的改进深度双向变压器模型在准确率、查全率和f值方面都有显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 9th International Conference on Dependable Systems and Their Applications (DSA)

自引率

0.00%

发文量