基于预训练语言模型“文本到文本转换转换器(T5)”的抽象文本摘要

Ilkom Jurnal Ilmiah Pub Date : 2023-04-07 DOI:10.33096/ilkom.v15i1.1532.124-131

Qurrota A’yuna Itsnaini, Mardhiya Hayaty, Andriyan Dwi Putra, N. Jabari

{"title":"基于预训练语言模型“文本到文本转换转换器(T5)”的抽象文本摘要","authors":"Qurrota A’yuna Itsnaini, Mardhiya Hayaty, Andriyan Dwi Putra, N. Jabari","doi":"10.33096/ilkom.v15i1.1532.124-131","DOIUrl":null,"url":null,"abstract":"Automatic Text Summarization (ATS) is one of the utilizations of technological sophistication in terms of text processing assisting humans in producing a summary or key points of a document in large quantities. We use Indonesian language as objects because there are few resources in NLP research using Indonesian language. This paper utilized PLTMs (Pre-Trained Language Models) from the transformer architecture, namely T5 (Text-to-Text Transfer Transformer) which has been completed previously with a larger dataset. Evaluation in this study was measured through comparison of the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) calculation results between the reference summary and the model summary. The experiments with the pre-trained t5-base model with fine tuning parameters of 220M for the Indonesian news dataset yielded relatively high ROUGE values, namely ROUGE-1 = 0.68, ROUGE-2 = 0.61, and ROUGE-L = 0.65. The evaluation value worked well, but the resulting model has not achieved satisfactory results because in terms of abstraction, the model did not work optimally. We also found several errors in the reference summary in the dataset used.","PeriodicalId":33690,"journal":{"name":"Ilkom Jurnal Ilmiah","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Abstractive Text Summarization using Pre-Trained Language Model \\\"Text-to-Text Transfer Transformer (T5)\\\"\",\"authors\":\"Qurrota A’yuna Itsnaini, Mardhiya Hayaty, Andriyan Dwi Putra, N. Jabari\",\"doi\":\"10.33096/ilkom.v15i1.1532.124-131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Text Summarization (ATS) is one of the utilizations of technological sophistication in terms of text processing assisting humans in producing a summary or key points of a document in large quantities. We use Indonesian language as objects because there are few resources in NLP research using Indonesian language. This paper utilized PLTMs (Pre-Trained Language Models) from the transformer architecture, namely T5 (Text-to-Text Transfer Transformer) which has been completed previously with a larger dataset. Evaluation in this study was measured through comparison of the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) calculation results between the reference summary and the model summary. The experiments with the pre-trained t5-base model with fine tuning parameters of 220M for the Indonesian news dataset yielded relatively high ROUGE values, namely ROUGE-1 = 0.68, ROUGE-2 = 0.61, and ROUGE-L = 0.65. The evaluation value worked well, but the resulting model has not achieved satisfactory results because in terms of abstraction, the model did not work optimally. We also found several errors in the reference summary in the dataset used.\",\"PeriodicalId\":33690,\"journal\":{\"name\":\"Ilkom Jurnal Ilmiah\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ilkom Jurnal Ilmiah\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33096/ilkom.v15i1.1532.124-131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ilkom Jurnal Ilmiah","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33096/ilkom.v15i1.1532.124-131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自动文本摘要(Automatic Text Summarization, ATS)是在文本处理方面利用复杂的技术来帮助人类大量生成文档的摘要或要点的一种方法。我们之所以选择印尼语作为研究对象，是因为目前使用印尼语的自然语言处理研究资源很少。本文利用了来自转换器架构的pltm(预训练语言模型)，即T5(文本到文本传输转换器)，该转换器之前已经完成了一个更大的数据集。本研究的评价是通过比较参考总结和模型总结的ROUGE (Recall-Oriented Understudy for Gisting Evaluation)计算结果来衡量的。对印尼新闻数据集使用预训练的t5基模型和微调参数为220M的实验得到了较高的ROUGE值，即ROUGE-1 = 0.68, ROUGE-2 = 0.61, ROUGE- l = 0.65。评价值工作得很好，但由于在抽象方面，模型没有达到最优的效果，所以最终的模型并没有取得令人满意的结果。我们还在使用的数据集中发现了参考摘要中的几个错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Abstractive Text Summarization using Pre-Trained Language Model "Text-to-Text Transfer Transformer (T5)"

Automatic Text Summarization (ATS) is one of the utilizations of technological sophistication in terms of text processing assisting humans in producing a summary or key points of a document in large quantities. We use Indonesian language as objects because there are few resources in NLP research using Indonesian language. This paper utilized PLTMs (Pre-Trained Language Models) from the transformer architecture, namely T5 (Text-to-Text Transfer Transformer) which has been completed previously with a larger dataset. Evaluation in this study was measured through comparison of the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) calculation results between the reference summary and the model summary. The experiments with the pre-trained t5-base model with fine tuning parameters of 220M for the Indonesian news dataset yielded relatively high ROUGE values, namely ROUGE-1 = 0.68, ROUGE-2 = 0.61, and ROUGE-L = 0.65. The evaluation value worked well, but the resulting model has not achieved satisfactory results because in terms of abstraction, the model did not work optimally. We also found several errors in the reference summary in the dataset used.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ilkom Jurnal Ilmiah

自引率

0.00%

发文量

审稿时长

4 weeks