基于多特征评价和掩码机制的关键词提取方法

Liwen Ma, Weifeng Liu
{"title":"基于多特征评价和掩码机制的关键词提取方法","authors":"Liwen Ma, Weifeng Liu","doi":"10.1109/ICCAIS56082.2022.9990092","DOIUrl":null,"url":null,"abstract":"Keyphrase extraction aims to identify phrases in documents that contain core content. However, existing unsupervised keyphrase extraction models are limited to focusing on a single feature leading to biased results. In response to the above problems, it evaluates keyphrase scores through multiple features of semantic importance, topic diversity, and position features. Firstly, it masked the candidate keyphrase from a document and the Manhattan distance between the mask document and the original document is calculated as the semantic importance feature. Secondly, it calculated the topic-word distribution of candidate keyphrases as topic diversity, and the position features are calculated. Finally, the phrase importance score is calculated by integrating the three sub-models. Experiments are conducted on three academic datasets and compared with six state-of-the-art baseline models, outperforming existing methods. The results show that evaluating phrase importance from multiple features significantly improves the performance of extracting keyphrases.","PeriodicalId":273404,"journal":{"name":"2022 11th International Conference on Control, Automation and Information Sciences (ICCAIS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Keyphrase Extraction Method Based on Multi-feature Evaluation and Mask Mechanism\",\"authors\":\"Liwen Ma, Weifeng Liu\",\"doi\":\"10.1109/ICCAIS56082.2022.9990092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keyphrase extraction aims to identify phrases in documents that contain core content. However, existing unsupervised keyphrase extraction models are limited to focusing on a single feature leading to biased results. In response to the above problems, it evaluates keyphrase scores through multiple features of semantic importance, topic diversity, and position features. Firstly, it masked the candidate keyphrase from a document and the Manhattan distance between the mask document and the original document is calculated as the semantic importance feature. Secondly, it calculated the topic-word distribution of candidate keyphrases as topic diversity, and the position features are calculated. Finally, the phrase importance score is calculated by integrating the three sub-models. Experiments are conducted on three academic datasets and compared with six state-of-the-art baseline models, outperforming existing methods. The results show that evaluating phrase importance from multiple features significantly improves the performance of extracting keyphrases.\",\"PeriodicalId\":273404,\"journal\":{\"name\":\"2022 11th International Conference on Control, Automation and Information Sciences (ICCAIS)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 11th International Conference on Control, Automation and Information Sciences (ICCAIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCAIS56082.2022.9990092\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference on Control, Automation and Information Sciences (ICCAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAIS56082.2022.9990092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

关键词提取旨在识别文档中包含核心内容的短语。然而,现有的无监督关键字提取模型仅限于关注单个特征,导致结果有偏差。针对上述问题,该算法通过语义重要性、话题多样性和位置特征等多个特征来评估关键词得分。首先,将候选关键词从文档中屏蔽出来,计算掩码文档与原始文档之间的曼哈顿距离作为语义重要性特征。其次,计算候选关键词的主题词分布作为主题多样性,并计算其位置特征;最后,通过对三个子模型的整合,计算出短语重要性得分。实验在三个学术数据集上进行,并与六个最先进的基线模型进行了比较,优于现有方法。结果表明,从多个特征中评估短语重要性显著提高了关键短语提取的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Keyphrase Extraction Method Based on Multi-feature Evaluation and Mask Mechanism
Keyphrase extraction aims to identify phrases in documents that contain core content. However, existing unsupervised keyphrase extraction models are limited to focusing on a single feature leading to biased results. In response to the above problems, it evaluates keyphrase scores through multiple features of semantic importance, topic diversity, and position features. Firstly, it masked the candidate keyphrase from a document and the Manhattan distance between the mask document and the original document is calculated as the semantic importance feature. Secondly, it calculated the topic-word distribution of candidate keyphrases as topic diversity, and the position features are calculated. Finally, the phrase importance score is calculated by integrating the three sub-models. Experiments are conducted on three academic datasets and compared with six state-of-the-art baseline models, outperforming existing methods. The results show that evaluating phrase importance from multiple features significantly improves the performance of extracting keyphrases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信