{"title":"一种新的文本摘要的无监督微调方法,突出了ROUGE评分的局限性","authors":"Ala Alam Falaki, Robin Gras","doi":"10.1016/j.mlwa.2025.100666","DOIUrl":null,"url":null,"abstract":"<div><div>The limited availability of datasets for text summarization tasks and their similar characteristics (e.g. news articles) make it crucial to focus on unsupervised learning techniques to enable summarization across different domains. Moreover, since summarization produces text output, effective methods developed for news articles can be applied to other domains lacking sufficient labeled data. This study introduces a novel target selection process to be used as an unsupervised learning method for fine-tuning text summarization models with unlabeled data. The process involves two-steps: first, generating an extractive summary (Ext-Reference) from the article, and second, using an abstractive model to create a pool of candidate summaries. The most suitable summary (to be used as the target) is then selected by calculating the cosine similarity between the Ext-Reference’s embedding and each candidate’s embedding. Furthermore, this project underscores the limitations of the ROUGE score, which assigns a relatively low score to this method. However, extended analysis with various metrics, including using GPT-4 as a judge, demonstrates the effectiveness of this technique for fine-tuning models without a specific target reference. It highlights the importance of using a combination of metrics, like those included in the SumEvaluator package released alongside this paper. SumEvaluator package on Github: <span><span>https://github.com/AlaFalaki/SumEvaluator</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100666"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel unsupervised fine-tuning method for text summarization, and highlighting the limitations of ROUGE score\",\"authors\":\"Ala Alam Falaki, Robin Gras\",\"doi\":\"10.1016/j.mlwa.2025.100666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The limited availability of datasets for text summarization tasks and their similar characteristics (e.g. news articles) make it crucial to focus on unsupervised learning techniques to enable summarization across different domains. Moreover, since summarization produces text output, effective methods developed for news articles can be applied to other domains lacking sufficient labeled data. This study introduces a novel target selection process to be used as an unsupervised learning method for fine-tuning text summarization models with unlabeled data. The process involves two-steps: first, generating an extractive summary (Ext-Reference) from the article, and second, using an abstractive model to create a pool of candidate summaries. The most suitable summary (to be used as the target) is then selected by calculating the cosine similarity between the Ext-Reference’s embedding and each candidate’s embedding. Furthermore, this project underscores the limitations of the ROUGE score, which assigns a relatively low score to this method. However, extended analysis with various metrics, including using GPT-4 as a judge, demonstrates the effectiveness of this technique for fine-tuning models without a specific target reference. It highlights the importance of using a combination of metrics, like those included in the SumEvaluator package released alongside this paper. SumEvaluator package on Github: <span><span>https://github.com/AlaFalaki/SumEvaluator</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"20 \",\"pages\":\"Article 100666\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025000490\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A novel unsupervised fine-tuning method for text summarization, and highlighting the limitations of ROUGE score
The limited availability of datasets for text summarization tasks and their similar characteristics (e.g. news articles) make it crucial to focus on unsupervised learning techniques to enable summarization across different domains. Moreover, since summarization produces text output, effective methods developed for news articles can be applied to other domains lacking sufficient labeled data. This study introduces a novel target selection process to be used as an unsupervised learning method for fine-tuning text summarization models with unlabeled data. The process involves two-steps: first, generating an extractive summary (Ext-Reference) from the article, and second, using an abstractive model to create a pool of candidate summaries. The most suitable summary (to be used as the target) is then selected by calculating the cosine similarity between the Ext-Reference’s embedding and each candidate’s embedding. Furthermore, this project underscores the limitations of the ROUGE score, which assigns a relatively low score to this method. However, extended analysis with various metrics, including using GPT-4 as a judge, demonstrates the effectiveness of this technique for fine-tuning models without a specific target reference. It highlights the importance of using a combination of metrics, like those included in the SumEvaluator package released alongside this paper. SumEvaluator package on Github: https://github.com/AlaFalaki/SumEvaluator.