{"title":"Improved Deep Bi-directional Transformer Keyword Extraction based on Semantic Understanding of News","authors":"Rui Cheng, Haijun Zhang","doi":"10.1109/DSA56465.2022.00110","DOIUrl":null,"url":null,"abstract":"To address the problems of existing methods such as neglecting semantic information and the lack of diversity in extracted keywords. This paper proposes an improved deep bi-directional transformer model based on semantic understanding of news, combining pre-trained word vectors with K-Means algorithm. After extracting word vectors with rich semantic information based on contextual words through the bert pre-training model, then the K-Means clustering algorithm is used to form clusters of different topics. The extracted keywords semantically highlight the central theme and at the same time can better solve the problem that lack of the diversity of keywords. Experiments show that the improved deep bi-directional transformer model based on news language understanding proposed in this paper has significantly improved in accuracy, recall and F-value compared with RAKE, TF-IDF, LDA, RNN, LSTM models for extracting keywords and word2vec models for static extraction of word vectors.","PeriodicalId":208148,"journal":{"name":"2022 9th International Conference on Dependable Systems and Their Applications (DSA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th International Conference on Dependable Systems and Their Applications (DSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSA56465.2022.00110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
To address the problems of existing methods such as neglecting semantic information and the lack of diversity in extracted keywords. This paper proposes an improved deep bi-directional transformer model based on semantic understanding of news, combining pre-trained word vectors with K-Means algorithm. After extracting word vectors with rich semantic information based on contextual words through the bert pre-training model, then the K-Means clustering algorithm is used to form clusters of different topics. The extracted keywords semantically highlight the central theme and at the same time can better solve the problem that lack of the diversity of keywords. Experiments show that the improved deep bi-directional transformer model based on news language understanding proposed in this paper has significantly improved in accuracy, recall and F-value compared with RAKE, TF-IDF, LDA, RNN, LSTM models for extracting keywords and word2vec models for static extraction of word vectors.