使用 BERT 模型和 ROUGE 分数评估自动生成参考文献摘要的比较研究

Q4 Multidisciplinary

Journal of Current Science and Technology Pub Date : 2024-05-02 DOI:10.59796/jcst.v14n2.2024.26

Nattapong Sanchan

{"title":"使用 BERT 模型和 ROUGE 分数评估自动生成参考文献摘要的比较研究","authors":"Nattapong Sanchan","doi":"10.59796/jcst.v14n2.2024.26","DOIUrl":null,"url":null,"abstract":"Automatic text summarization is a sub-area in text mining in which a computer system determines the most informative information in the original text to produce a summary for certain jobs and users. In the development of the systems, one of the most important tasks is to evaluate the quality of summaries produced by the systems. Generally, the evaluation task becomes laborious, time-consuming, and expensive because it requires significant efforts on annotation tasks for humans to manually create reference summaries. Being able to generate automatic reference summaries would promote the development of summarization systems in term of speed and evaluation. In this paper, we proposed an Auto-Ref Summary Generation framework for automatically generating reference summaries used in the generic text summarization evaluation task, the Sliced Summary. Given a set of clusters from a cluster ground-truth label dataset, variants of BERT models were utilized for creating cluster representations. The automatic reference summaries were later generated through a centroid-based summarization approach. Overall, DistilBERT, ROBERTa, and SBERT have played crucial roles in automatic summary generation, achieving the highest ROUGE-1 score of 0.47060. However, this does not meet our expectation on text coherence and readability aspects. Although the summaries generated through our proposed framework could not be used as the replacement of the manual summaries, this study has shed new light on the acquisition of automatic reference summaries from a ground-truth label dataset.","PeriodicalId":36369,"journal":{"name":"Journal of Current Science and Technology","volume":"14 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Study on Automated Reference Summary Generation using BERT Models and ROUGE Score Assessment\",\"authors\":\"Nattapong Sanchan\",\"doi\":\"10.59796/jcst.v14n2.2024.26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic text summarization is a sub-area in text mining in which a computer system determines the most informative information in the original text to produce a summary for certain jobs and users. In the development of the systems, one of the most important tasks is to evaluate the quality of summaries produced by the systems. Generally, the evaluation task becomes laborious, time-consuming, and expensive because it requires significant efforts on annotation tasks for humans to manually create reference summaries. Being able to generate automatic reference summaries would promote the development of summarization systems in term of speed and evaluation. In this paper, we proposed an Auto-Ref Summary Generation framework for automatically generating reference summaries used in the generic text summarization evaluation task, the Sliced Summary. Given a set of clusters from a cluster ground-truth label dataset, variants of BERT models were utilized for creating cluster representations. The automatic reference summaries were later generated through a centroid-based summarization approach. Overall, DistilBERT, ROBERTa, and SBERT have played crucial roles in automatic summary generation, achieving the highest ROUGE-1 score of 0.47060. However, this does not meet our expectation on text coherence and readability aspects. Although the summaries generated through our proposed framework could not be used as the replacement of the manual summaries, this study has shed new light on the acquisition of automatic reference summaries from a ground-truth label dataset.\",\"PeriodicalId\":36369,\"journal\":{\"name\":\"Journal of Current Science and Technology\",\"volume\":\"14 5\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Current Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.59796/jcst.v14n2.2024.26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Current Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59796/jcst.v14n2.2024.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Multidisciplinary","Score":null,"Total":0}

引用次数: 0

摘要

自动文本摘要是文本挖掘中的一个子领域，计算机系统会确定原始文本中信息量最大的内容，为特定工作和用户生成摘要。在系统开发过程中，最重要的任务之一就是评估系统所生成摘要的质量。一般来说，评估任务会变得费力、费时、费钱，因为这需要人工手动创建参考文献摘要的大量注释工作。能够自动生成参考文献摘要将在速度和评估方面促进摘要系统的发展。在本文中，我们提出了一个自动参考摘要生成框架，用于自动生成通用文本摘要评估任务--切片摘要--中使用的参考摘要。给定一组来自集群地面实况标签数据集的集群，利用 BERT 模型的变体来创建集群表示。随后，通过基于中心点的摘要方法生成自动参考摘要。总体而言，DistilBERT、ROBERTa 和 SBERT 在自动摘要生成中发挥了关键作用，获得了最高的 ROUGE-1 分数 0.47060。然而，这在文本连贯性和可读性方面并没有达到我们的预期。虽然通过我们提出的框架生成的摘要不能替代人工摘要，但本研究为从地面实况标签数据集获取自动参考摘要提供了新的思路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative Study on Automated Reference Summary Generation using BERT Models and ROUGE Score Assessment

Automatic text summarization is a sub-area in text mining in which a computer system determines the most informative information in the original text to produce a summary for certain jobs and users. In the development of the systems, one of the most important tasks is to evaluate the quality of summaries produced by the systems. Generally, the evaluation task becomes laborious, time-consuming, and expensive because it requires significant efforts on annotation tasks for humans to manually create reference summaries. Being able to generate automatic reference summaries would promote the development of summarization systems in term of speed and evaluation. In this paper, we proposed an Auto-Ref Summary Generation framework for automatically generating reference summaries used in the generic text summarization evaluation task, the Sliced Summary. Given a set of clusters from a cluster ground-truth label dataset, variants of BERT models were utilized for creating cluster representations. The automatic reference summaries were later generated through a centroid-based summarization approach. Overall, DistilBERT, ROBERTa, and SBERT have played crucial roles in automatic summary generation, achieving the highest ROUGE-1 score of 0.47060. However, this does not meet our expectation on text coherence and readability aspects. Although the summaries generated through our proposed framework could not be used as the replacement of the manual summaries, this study has shed new light on the acquisition of automatic reference summaries from a ground-truth label dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Current Science and Technology Multidisciplinary-Multidisciplinary

CiteScore

0.80

自引率

0.00%

发文量