Alessia Auriemma Citarella , Marcello Barbella , Madalina G. Ciobanu , Fabiola De Marco , Luigi Di Biasi , Genoveffa Tortora
{"title":"Assessing the effectiveness of ROUGE as unbiased metric in Extractive vs. Abstractive summarization techniques","authors":"Alessia Auriemma Citarella , Marcello Barbella , Madalina G. Ciobanu , Fabiola De Marco , Luigi Di Biasi , Genoveffa Tortora","doi":"10.1016/j.jocs.2025.102571","DOIUrl":null,"url":null,"abstract":"<div><div>Approaches to Automatic Text Summarization try to extract key information from one or more input texts and generate summaries whilst preserving content meaning. These strategies are separated into two groups, Extractive and Abstractive, which differ in their work. The extractive summarization extracts sentences from the document text directly, whereas the abstractive summarization creates a summary by interpreting the text and rewriting sentences, often with new words. It is important to assess and confirm how similar a summary is to the original text independently of the particular TS algorithm adopted. The literature proposes various metrics and scores for evaluating text summarization results, and ROUGE (Recall-Oriented Understudy of Gisting Evaluation) is the most used. In this study, our main objective is to evaluate how the ROUGE metric performs when applied to both Extractive and Abstractive summarization algorithms. We aim to understand its effectiveness and reliability as an independent and unbiased metric in assessing the quality of summaries generated by these different approaches. We conducted a first experiment to compare the metric efficiency (ROUGE-1, ROUGE-2 and ROUGE-L) for evaluating Abstractive (word2vec, doc2vec, and glove) <span><math><mrow><mi>v</mi><mi>e</mi><mi>r</mi><mi>s</mi><mi>u</mi><mi>s</mi></mrow></math></span> Extractive Text Summarization algorithms (textRank, lsa, luhn, lexRank), and a second one to compare the obtained score for two different summary approaches: a simple execution of a summarization algorithm <span><math><mrow><mi>v</mi><mi>e</mi><mi>r</mi><mi>s</mi><mi>u</mi><mi>s</mi></mrow></math></span> a multiple execution of different algorithms on the same text. Based on our study, evaluating the ROUGE metric for Abstractive and Extractive algorithms revealed that it reaches similar results for the Abstractive and Extractive algorithms. Moreover, our findings indicate that multiple executions, based on the running of two text summarization algorithms sequentially on the same text, generally outperform single executions of a single text summarization algorithm.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"87 ","pages":"Article 102571"},"PeriodicalIF":3.1000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877750325000481","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Approaches to Automatic Text Summarization try to extract key information from one or more input texts and generate summaries whilst preserving content meaning. These strategies are separated into two groups, Extractive and Abstractive, which differ in their work. The extractive summarization extracts sentences from the document text directly, whereas the abstractive summarization creates a summary by interpreting the text and rewriting sentences, often with new words. It is important to assess and confirm how similar a summary is to the original text independently of the particular TS algorithm adopted. The literature proposes various metrics and scores for evaluating text summarization results, and ROUGE (Recall-Oriented Understudy of Gisting Evaluation) is the most used. In this study, our main objective is to evaluate how the ROUGE metric performs when applied to both Extractive and Abstractive summarization algorithms. We aim to understand its effectiveness and reliability as an independent and unbiased metric in assessing the quality of summaries generated by these different approaches. We conducted a first experiment to compare the metric efficiency (ROUGE-1, ROUGE-2 and ROUGE-L) for evaluating Abstractive (word2vec, doc2vec, and glove) Extractive Text Summarization algorithms (textRank, lsa, luhn, lexRank), and a second one to compare the obtained score for two different summary approaches: a simple execution of a summarization algorithm a multiple execution of different algorithms on the same text. Based on our study, evaluating the ROUGE metric for Abstractive and Extractive algorithms revealed that it reaches similar results for the Abstractive and Extractive algorithms. Moreover, our findings indicate that multiple executions, based on the running of two text summarization algorithms sequentially on the same text, generally outperform single executions of a single text summarization algorithm.
期刊介绍:
Computational Science is a rapidly growing multi- and interdisciplinary field that uses advanced computing and data analysis to understand and solve complex problems. It has reached a level of predictive capability that now firmly complements the traditional pillars of experimentation and theory.
The recent advances in experimental techniques such as detectors, on-line sensor networks and high-resolution imaging techniques, have opened up new windows into physical and biological processes at many levels of detail. The resulting data explosion allows for detailed data driven modeling and simulation.
This new discipline in science combines computational thinking, modern computational methods, devices and collateral technologies to address problems far beyond the scope of traditional numerical methods.
Computational science typically unifies three distinct elements:
• Modeling, Algorithms and Simulations (e.g. numerical and non-numerical, discrete and continuous);
• Software developed to solve science (e.g., biological, physical, and social), engineering, medicine, and humanities problems;
• Computer and information science that develops and optimizes the advanced system hardware, software, networking, and data management components (e.g. problem solving environments).