An Explorative Study on Extractive Text Summarization through k-means, LSA, and TextRank

2023 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET) Pub Date : 2023-03-29 DOI:10.1109/WiSPNET57748.2023.10134303

K. Ramani, K. Bhavana, A. Akshaya, K. Harshita, C. R. Thoran Kumar, Maya Srikanth

{"title":"An Explorative Study on Extractive Text Summarization through k-means, LSA, and TextRank","authors":"K. Ramani, K. Bhavana, A. Akshaya, K. Harshita, C. R. Thoran Kumar, Maya Srikanth","doi":"10.1109/WiSPNET57748.2023.10134303","DOIUrl":null,"url":null,"abstract":"Notably the difficult and exciting issue in the field of Natural Language Processing (NLP) is summarizing the text. Understanding the main objective of any type of document is crucial. Some of the applications of text summarization are media monitoring, social media, marketing, health care, literature, and books. Text summarization techniques are implemented using extractive summarization techniques in the health care domain in which it considers patient health history. To visualize a lengthy patient health history document quickly we use machine learning techniques like k-means, Text Rank, and Latent Semantic Analysis to comprehend and identify the sections that communicate important information to produce the summarized texts. These methods are evaluated using ROUGE-1, ROUGE-2, and ROUGE-N metrics to obtain the highest similarity of extracted text. k-means outperformed the considered approaches compared to Text Rank and Latent Semantic Analysis in summarizing the documents. k-Means was more efficient, where it achieved an average of 94.52% precision, 90.98% recall, and 91.25% F1-score.","PeriodicalId":150576,"journal":{"name":"2023 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WiSPNET57748.2023.10134303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Notably the difficult and exciting issue in the field of Natural Language Processing (NLP) is summarizing the text. Understanding the main objective of any type of document is crucial. Some of the applications of text summarization are media monitoring, social media, marketing, health care, literature, and books. Text summarization techniques are implemented using extractive summarization techniques in the health care domain in which it considers patient health history. To visualize a lengthy patient health history document quickly we use machine learning techniques like k-means, Text Rank, and Latent Semantic Analysis to comprehend and identify the sections that communicate important information to produce the summarized texts. These methods are evaluated using ROUGE-1, ROUGE-2, and ROUGE-N metrics to obtain the highest similarity of extracted text. k-means outperformed the considered approaches compared to Text Rank and Latent Semantic Analysis in summarizing the documents. k-Means was more efficient, where it achieved an average of 94.52% precision, 90.98% recall, and 91.25% F1-score.

查看原文本刊更多论文

基于k-means、LSA和TextRank的抽取文本摘要的探索性研究

值得注意的是，自然语言处理(NLP)领域的难点和令人兴奋的问题是总结文本。理解任何类型文档的主要目的都是至关重要的。文本摘要的一些应用包括媒体监控、社交媒体、市场营销、医疗保健、文学和书籍。文本摘要技术是使用医疗保健领域的提取摘要技术实现的，其中考虑了患者的健康史。为了快速可视化冗长的患者健康史文档，我们使用k-means、文本秩和潜在语义分析等机器学习技术来理解和识别传达重要信息的部分，以生成摘要文本。使用ROUGE-1, ROUGE-2和ROUGE-N指标对这些方法进行评估，以获得提取文本的最高相似度。与文本秩和潜在语义分析相比，k-means在总结文档方面优于所考虑的方法。k-Means更有效，平均准确率为94.52%，召回率为90.98%，F1-score为91.25%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET)

自引率

0.00%

发文量