关键词提取:基于文档段落权重的方法

Lahbib Ajallouda, A. Zellou, Imane Ettahiri, Karim Doumi
{"title":"关键词提取:基于文档段落权重的方法","authors":"Lahbib Ajallouda, A. Zellou, Imane Ettahiri, Karim Doumi","doi":"10.1109/ICCMSO58359.2022.00051","DOIUrl":null,"url":null,"abstract":"In recent years, the exploitation of sentence embedding techniques in natural language processing field has encouraged the proposal of new methods for extracting keyphrases from documents based on these techniques. Most of these approaches select keyphrases from a set of candidate phrases based on their semantic proximity to the document. In general, most documents contain complementary paragraphs that are unrelated to the topics covered. This factor reduces the credibility of the semantic proximity of candidate keyphrases to the document. Exploitation of document paragraphs weights during the semantic similarity calculation will inevitably improve the performance of keyphrase extraction from document. In this paper, we propose a new method to extract keyphrases based on document paragraphs weights. Our method is based on sentence embedding techniques and semantic proximity of candidate key phrases from document paragraphs. We evaluated the proposed method on three datasets, Inspec, Semeval2010 and KPTimes, where our results showed that that using document paragraph weight improved the performance of keyphrases extraction.","PeriodicalId":209727,"journal":{"name":"2022 International Conference on Computational Modelling, Simulation and Optimization (ICCMSO)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Keyphrases extraction: Approach Based on Document Paragraph Weights\",\"authors\":\"Lahbib Ajallouda, A. Zellou, Imane Ettahiri, Karim Doumi\",\"doi\":\"10.1109/ICCMSO58359.2022.00051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the exploitation of sentence embedding techniques in natural language processing field has encouraged the proposal of new methods for extracting keyphrases from documents based on these techniques. Most of these approaches select keyphrases from a set of candidate phrases based on their semantic proximity to the document. In general, most documents contain complementary paragraphs that are unrelated to the topics covered. This factor reduces the credibility of the semantic proximity of candidate keyphrases to the document. Exploitation of document paragraphs weights during the semantic similarity calculation will inevitably improve the performance of keyphrase extraction from document. In this paper, we propose a new method to extract keyphrases based on document paragraphs weights. Our method is based on sentence embedding techniques and semantic proximity of candidate key phrases from document paragraphs. We evaluated the proposed method on three datasets, Inspec, Semeval2010 and KPTimes, where our results showed that that using document paragraph weight improved the performance of keyphrases extraction.\",\"PeriodicalId\":209727,\"journal\":{\"name\":\"2022 International Conference on Computational Modelling, Simulation and Optimization (ICCMSO)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computational Modelling, Simulation and Optimization (ICCMSO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCMSO58359.2022.00051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computational Modelling, Simulation and Optimization (ICCMSO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMSO58359.2022.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,句子嵌入技术在自然语言处理领域的广泛应用促进了基于这些技术的关键短语提取新方法的提出。这些方法中的大多数都是根据与文档的语义接近度从一组候选短语中选择关键短语。一般来说,大多数文档包含与所述主题无关的补充段落。这个因素降低了候选关键短语与文档的语义接近度的可信度。在语义相似度计算过程中利用文档段落权重,必然会提高从文档中提取关键短语的性能。本文提出了一种基于段落权重的关键词提取方法。该方法基于句子嵌入技术和文档段落候选关键短语的语义接近度。我们在Inspec、Semeval2010和KPTimes三个数据集上对所提出的方法进行了评估,结果表明使用文档段落权重提高了关键短语提取的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Keyphrases extraction: Approach Based on Document Paragraph Weights
In recent years, the exploitation of sentence embedding techniques in natural language processing field has encouraged the proposal of new methods for extracting keyphrases from documents based on these techniques. Most of these approaches select keyphrases from a set of candidate phrases based on their semantic proximity to the document. In general, most documents contain complementary paragraphs that are unrelated to the topics covered. This factor reduces the credibility of the semantic proximity of candidate keyphrases to the document. Exploitation of document paragraphs weights during the semantic similarity calculation will inevitably improve the performance of keyphrase extraction from document. In this paper, we propose a new method to extract keyphrases based on document paragraphs weights. Our method is based on sentence embedding techniques and semantic proximity of candidate key phrases from document paragraphs. We evaluated the proposed method on three datasets, Inspec, Semeval2010 and KPTimes, where our results showed that that using document paragraph weight improved the performance of keyphrases extraction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信