A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers

Mehmet Ali Dursun, Soydan Serttaş
{"title":"A Multi-Metric Model for analyzing and comparing extractive text summarization approaches and algorithms on scientific papers","authors":"Mehmet Ali Dursun, Soydan Serttaş","doi":"10.24012/dumf.1376978","DOIUrl":null,"url":null,"abstract":"In today's world, where data and information are increasingly proliferating, text summarization and technologies play a critical role in making large amounts of text data more accessible and meaningful. In business, the news industry, academic research, and many other fields, text summarization helps make quick decisions, access information faster, and manage resources more effectively. Additionally, text summarization research is conducted to further improve these technologies and develop new methods and algorithms to provide better summarization of texts. Therefore, text summarization and research in this field are of great importance in the information age. In this study, a new operating model for text summarization that can be applied to different algorithms is proposed and evaluated. Sixteen summarization algorithms covering six approaches (statistical, graph-based, content-based, pointer-based, position-based, and user-oriented) were implemented and tested on 50 different full-text article datasets. Four evaluation criteria (BLEU, Rouge-N, Rouge-L, METEOR) were used to assess the similarity between the generated summaries and the original summaries. The performance of the algorithms within each approach was averaged and the overall best-performing algorithm was selected. This best algorithm was subjected to further analysis through Topic Modelling and Keyword Extraction to identify key topics and keywords within the summarised text. The proposed model provides a standardized workflow for developing and thoroughly testing summarization algorithms across datasets and evaluation metrics to determine the most appropriate summarization approach. This study demonstrates the effectiveness of the model on a variety of algorithm types and text sources.","PeriodicalId":158576,"journal":{"name":"DÜMF Mühendislik Dergisi","volume":"108 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DÜMF Mühendislik Dergisi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24012/dumf.1376978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In today's world, where data and information are increasingly proliferating, text summarization and technologies play a critical role in making large amounts of text data more accessible and meaningful. In business, the news industry, academic research, and many other fields, text summarization helps make quick decisions, access information faster, and manage resources more effectively. Additionally, text summarization research is conducted to further improve these technologies and develop new methods and algorithms to provide better summarization of texts. Therefore, text summarization and research in this field are of great importance in the information age. In this study, a new operating model for text summarization that can be applied to different algorithms is proposed and evaluated. Sixteen summarization algorithms covering six approaches (statistical, graph-based, content-based, pointer-based, position-based, and user-oriented) were implemented and tested on 50 different full-text article datasets. Four evaluation criteria (BLEU, Rouge-N, Rouge-L, METEOR) were used to assess the similarity between the generated summaries and the original summaries. The performance of the algorithms within each approach was averaged and the overall best-performing algorithm was selected. This best algorithm was subjected to further analysis through Topic Modelling and Keyword Extraction to identify key topics and keywords within the summarised text. The proposed model provides a standardized workflow for developing and thoroughly testing summarization algorithms across datasets and evaluation metrics to determine the most appropriate summarization approach. This study demonstrates the effectiveness of the model on a variety of algorithm types and text sources.
用于分析和比较科学论文提取文本摘要方法和算法的多指标模型
当今世界,数据和信息日益激增,文本摘要和技术在使大量文本数据更易于获取和更有意义方面发挥着至关重要的作用。在商业、新闻行业、学术研究和许多其他领域,文本摘要有助于快速决策、更快地获取信息和更有效地管理资源。此外,文本摘要研究的目的是进一步改进这些技术,开发新的方法和算法,以提供更好的文本摘要。因此,文本摘要和该领域的研究在信息时代具有重要意义。本研究提出并评估了一种可应用于不同算法的新文本摘要操作模型。在 50 个不同的全文文章数据集上实施并测试了涵盖六种方法(统计、基于图、基于内容、基于指针、基于位置和面向用户)的 16 种摘要算法。四个评价标准(BLEU、Rouge-N、Rouge-L、METEOR)用于评估生成的摘要与原始摘要之间的相似性。对每种方法中算法的性能进行平均,选出总体性能最佳的算法。通过主题建模和关键词提取对最佳算法进行进一步分析,以确定摘要文本中的关键主题和关键词。建议的模型提供了一个标准化的工作流程,用于开发和全面测试不同数据集和评价指标的摘要算法,以确定最合适的摘要方法。本研究证明了该模型在各种算法类型和文本源上的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信