{"title":"基于面向复制的上下文感知和抽象摘要加权奖励的深度强化学习","authors":"Caidong Tan","doi":"10.1145/3590003.3590019","DOIUrl":null,"url":null,"abstract":"This paper presents a deep context-aware model with a copy mechanism based on reinforcement learning for abstractive text summarization. Our model is optimized using weighted ROUGEs as global prediction-based rewards and the self-critical policy gradient training algorithm, which can reduce the inconsistency between training and testing by directly optimizing the evaluation metrics. To alleviate the lexical diversity and component diversity problems caused by global prediction rewards, we improve the richness of the multi-head self-attention mechanism to capture context through global deep context representation with copy mechanism. We conduct experiments and demonstrate that our model outperforms many existing benchmarks over the Gigaword, LCSTS, and CNN/DM datasets. The experimental results demonstrate that our model has a significant effect on improving the quality of summarization.","PeriodicalId":340225,"journal":{"name":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deep Reinforcement Learning with Copy-oriented Context Awareness and Weighted Rewards for Abstractive Summarization\",\"authors\":\"Caidong Tan\",\"doi\":\"10.1145/3590003.3590019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a deep context-aware model with a copy mechanism based on reinforcement learning for abstractive text summarization. Our model is optimized using weighted ROUGEs as global prediction-based rewards and the self-critical policy gradient training algorithm, which can reduce the inconsistency between training and testing by directly optimizing the evaluation metrics. To alleviate the lexical diversity and component diversity problems caused by global prediction rewards, we improve the richness of the multi-head self-attention mechanism to capture context through global deep context representation with copy mechanism. We conduct experiments and demonstrate that our model outperforms many existing benchmarks over the Gigaword, LCSTS, and CNN/DM datasets. The experimental results demonstrate that our model has a significant effect on improving the quality of summarization.\",\"PeriodicalId\":340225,\"journal\":{\"name\":\"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3590003.3590019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3590003.3590019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Reinforcement Learning with Copy-oriented Context Awareness and Weighted Rewards for Abstractive Summarization
This paper presents a deep context-aware model with a copy mechanism based on reinforcement learning for abstractive text summarization. Our model is optimized using weighted ROUGEs as global prediction-based rewards and the self-critical policy gradient training algorithm, which can reduce the inconsistency between training and testing by directly optimizing the evaluation metrics. To alleviate the lexical diversity and component diversity problems caused by global prediction rewards, we improve the richness of the multi-head self-attention mechanism to capture context through global deep context representation with copy mechanism. We conduct experiments and demonstrate that our model outperforms many existing benchmarks over the Gigaword, LCSTS, and CNN/DM datasets. The experimental results demonstrate that our model has a significant effect on improving the quality of summarization.