运用迁移学习和分而治之的方法提高抽象文本摘要的覆盖率和新颖性

IF 1.1 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ayham Alomari, N. Idris, Aznul Qalid, I. Alsmadi
{"title":"运用迁移学习和分而治之的方法提高抽象文本摘要的覆盖率和新颖性","authors":"Ayham Alomari, N. Idris, Aznul Qalid, I. Alsmadi","doi":"10.22452/mjcs.vol36no3.4","DOIUrl":null,"url":null,"abstract":"Automatic Text Summarization (ATS) models yield outcomes with insufficient coverage of crucial details and poor degrees of novelty. The first issue resulted from the lengthy input, while the second problem resulted from the characteristics of the training dataset itself. This research employs the divide-and-conquer approach to address the first issue by breaking the lengthy input into smaller pieces to be summarized, followed by the conquest of the results in order to cover more significant details. For the second challenge, these chunks are summarized by models trained on datasets with higher novelty levels in order to produce more human-like and concise summaries with more novel words that do not appear in the input article. The results demonstrate an improvement in both coverage and novelty levels. Moreover, we defined a new metric to measure the novelty of the summary. Finally, we investigated the findings to discover whether the novelty is influenced more by the dataset itself, as in CNN/DM, or by the training model and its training objective, as in Pegasus.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IMPROVING COVERAGE AND NOVELTY OF ABSTRACTIVE TEXT SUMMARIZATION USING TRANSFER LEARNING AND DIVIDE AND CONQUER APPROACHES\",\"authors\":\"Ayham Alomari, N. Idris, Aznul Qalid, I. Alsmadi\",\"doi\":\"10.22452/mjcs.vol36no3.4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Text Summarization (ATS) models yield outcomes with insufficient coverage of crucial details and poor degrees of novelty. The first issue resulted from the lengthy input, while the second problem resulted from the characteristics of the training dataset itself. This research employs the divide-and-conquer approach to address the first issue by breaking the lengthy input into smaller pieces to be summarized, followed by the conquest of the results in order to cover more significant details. For the second challenge, these chunks are summarized by models trained on datasets with higher novelty levels in order to produce more human-like and concise summaries with more novel words that do not appear in the input article. The results demonstrate an improvement in both coverage and novelty levels. Moreover, we defined a new metric to measure the novelty of the summary. Finally, we investigated the findings to discover whether the novelty is influenced more by the dataset itself, as in CNN/DM, or by the training model and its training objective, as in Pegasus.\",\"PeriodicalId\":49894,\"journal\":{\"name\":\"Malaysian Journal of Computer Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Malaysian Journal of Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.22452/mjcs.vol36no3.4\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.vol36no3.4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

自动文本摘要(ATS)模型产生的结果对关键细节的覆盖不足,新颖性较差。第一个问题是由于输入时间过长,而第二个问题是因为训练数据集本身的特性。这项研究采用了分而治之的方法来解决第一个问题,将冗长的输入分解成更小的部分进行总结,然后征服结果,以涵盖更重要的细节。对于第二个挑战,这些块由在具有更高新颖性水平的数据集上训练的模型进行总结,以便用输入文章中没有出现的更新颖的单词生成更人性化和简洁的摘要。结果表明,覆盖率和新颖性都有所提高。此外,我们定义了一个新的度量标准来衡量摘要的新颖性。最后,我们调查了这些发现,以发现新颖性是更多地受到数据集本身的影响,如在CNN/DM中,还是受到训练模型及其训练目标的影响,例如在飞马座中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
IMPROVING COVERAGE AND NOVELTY OF ABSTRACTIVE TEXT SUMMARIZATION USING TRANSFER LEARNING AND DIVIDE AND CONQUER APPROACHES
Automatic Text Summarization (ATS) models yield outcomes with insufficient coverage of crucial details and poor degrees of novelty. The first issue resulted from the lengthy input, while the second problem resulted from the characteristics of the training dataset itself. This research employs the divide-and-conquer approach to address the first issue by breaking the lengthy input into smaller pieces to be summarized, followed by the conquest of the results in order to cover more significant details. For the second challenge, these chunks are summarized by models trained on datasets with higher novelty levels in order to produce more human-like and concise summaries with more novel words that do not appear in the input article. The results demonstrate an improvement in both coverage and novelty levels. Moreover, we defined a new metric to measure the novelty of the summary. Finally, we investigated the findings to discover whether the novelty is influenced more by the dataset itself, as in CNN/DM, or by the training model and its training objective, as in Pegasus.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Malaysian Journal of Computer Science
Malaysian Journal of Computer Science COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, THEORY & METHODS
CiteScore
2.20
自引率
33.30%
发文量
35
审稿时长
7.5 months
期刊介绍: The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication.  The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信