Keyphrase extraction using graph-based statistical approach with NLP patterns

Siddhesh Mehta, Rushikesh Karwa, Rahul Chavan, Vaibhav Khatavkar, Amit Joshi
{"title":"Keyphrase extraction using graph-based statistical approach with NLP patterns","authors":"Siddhesh Mehta, Rushikesh Karwa, Rahul Chavan, Vaibhav Khatavkar, Amit Joshi","doi":"10.1007/s12046-024-02494-z","DOIUrl":null,"url":null,"abstract":"<p>Extracting keyphrases plays a vital role in the field of natural language processing, that focuses on recognizing and retrieving significant phrases that summarize the essential information in a document. This research paper introduces a novel approach to extract keyphrases using a statistical approach based on graphs that incorporates degree centrality, TextRank, closeness, and betweenness measures and natural language processing patterns. This approach involves constructing a graph representation of the document and identifying the most important nodes in the graph and leveraging natural language processing patterns to enhance the accuracy and relevance of the extracted keyphrases. The proposed model is examined on a standard dataset for performance evaluation and its outcomes are evaluated by comparing them with the state-of-art methods for extracting keyphrases. The precision, recall, and F-measure achieved by the proposed model are 0.5263, 0.5498, and 0.5323, respectively which shows that proposed model outperforms existing models. The principal novelty of this methodology resides in the utilization of statistical techniques based on graphs and patterns of natural language processing, which enable the detection of the most pertinent nodes and keyphrases of utmost significance. The proposed approach is generalizable to a wide range of domains and text types, making it a promising approach for keyphrase extraction in various applications, including content analysis, document classification, and search engine optimization. In conclusion, the proposed approach offers a robust and scalable solution for identifying keyphrases that capture the essential information of a document. Future research can build upon this approach to improve the efficiency and effectiveness of automated text analysis.</p>","PeriodicalId":21498,"journal":{"name":"Sādhanā","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sādhanā","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12046-024-02494-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Extracting keyphrases plays a vital role in the field of natural language processing, that focuses on recognizing and retrieving significant phrases that summarize the essential information in a document. This research paper introduces a novel approach to extract keyphrases using a statistical approach based on graphs that incorporates degree centrality, TextRank, closeness, and betweenness measures and natural language processing patterns. This approach involves constructing a graph representation of the document and identifying the most important nodes in the graph and leveraging natural language processing patterns to enhance the accuracy and relevance of the extracted keyphrases. The proposed model is examined on a standard dataset for performance evaluation and its outcomes are evaluated by comparing them with the state-of-art methods for extracting keyphrases. The precision, recall, and F-measure achieved by the proposed model are 0.5263, 0.5498, and 0.5323, respectively which shows that proposed model outperforms existing models. The principal novelty of this methodology resides in the utilization of statistical techniques based on graphs and patterns of natural language processing, which enable the detection of the most pertinent nodes and keyphrases of utmost significance. The proposed approach is generalizable to a wide range of domains and text types, making it a promising approach for keyphrase extraction in various applications, including content analysis, document classification, and search engine optimization. In conclusion, the proposed approach offers a robust and scalable solution for identifying keyphrases that capture the essential information of a document. Future research can build upon this approach to improve the efficiency and effectiveness of automated text analysis.

Abstract Image

利用基于图的统计方法和 NLP 模式提取关键词
提取关键短语在自然语言处理领域发挥着至关重要的作用,其重点是识别和检索概括文档基本信息的重要短语。本研究论文介绍了一种提取关键短语的新方法,该方法采用基于图的统计方法,结合了度中心性、TextRank、接近度和间隔度量以及自然语言处理模式。这种方法包括构建文档的图表示法,识别图中最重要的节点,并利用自然语言处理模式来提高提取关键词的准确性和相关性。我们在标准数据集上对所提出的模型进行了性能评估,并将其结果与最先进的关键词提取方法进行了比较。拟议模型的精确度、召回率和 F 测量值分别为 0.5263、0.5498 和 0.5323,这表明拟议模型优于现有模型。该方法的主要创新点在于利用了基于自然语言处理图和模式的统计技术,从而能够检测出最相关的节点和最重要的关键词。所提出的方法适用于广泛的领域和文本类型,使其成为内容分析、文档分类和搜索引擎优化等各种应用中提取关键词的有效方法。总之,所提出的方法提供了一种稳健且可扩展的解决方案,可用于识别捕捉文档基本信息的关键词。未来的研究可以在此方法的基础上提高自动文本分析的效率和效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信