Vietnamese Text Summarization Based on Elementary Discourse Units

Khang Nhut Lam, Tai Ngoc Nguyen, J. Kalita
{"title":"Vietnamese Text Summarization Based on Elementary Discourse Units","authors":"Khang Nhut Lam, Tai Ngoc Nguyen, J. Kalita","doi":"10.1145/3582768.3582793","DOIUrl":null,"url":null,"abstract":"This paper presents text summarization models based on elementary discourse units (EDUs) to construct extractive and abstractive summarization for Vietnamese documents. First, we introduce algorithms using the POS information for constructing EDUs in Vietnamese. Then, the EDUs created are fed into an extractive summarization model using a pointer network and an abstractive summarization model using a pointer generator model. A reinforcement learning method is used to improve the quality of the models. We perform experiments on the CTUNLPSUM dataset, including 1,053,702 Vietnamese documents extracted from online magazines. The extractive summarization models based on EDUs outperform other extractive summarization models based on words or sentences. The ROUGE-1, ROUGE-2, and ROUGE-L of the best extractive and abstractive summarization models are 0.567, 0.241, 0.461; and 0.530, 0.213, 0.394, respectively.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582768.3582793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper presents text summarization models based on elementary discourse units (EDUs) to construct extractive and abstractive summarization for Vietnamese documents. First, we introduce algorithms using the POS information for constructing EDUs in Vietnamese. Then, the EDUs created are fed into an extractive summarization model using a pointer network and an abstractive summarization model using a pointer generator model. A reinforcement learning method is used to improve the quality of the models. We perform experiments on the CTUNLPSUM dataset, including 1,053,702 Vietnamese documents extracted from online magazines. The extractive summarization models based on EDUs outperform other extractive summarization models based on words or sentences. The ROUGE-1, ROUGE-2, and ROUGE-L of the best extractive and abstractive summarization models are 0.567, 0.241, 0.461; and 0.530, 0.213, 0.394, respectively.
基于初级语篇单元的越南语文本摘要
本文提出了基于基本话语单元(EDUs)的文本摘要模型,用于构建越南语文档的抽取和抽象摘要。首先,我们介绍了使用词性信息构建越南语教学单元的算法。然后,使用指针网络将创建的edu输入到抽取摘要模型和使用指针生成器模型的抽象摘要模型中。采用强化学习方法来提高模型的质量。我们在CTUNLPSUM数据集上进行实验,该数据集包括从在线杂志中提取的1,053,702篇越南语文档。基于edu的提取摘要模型优于其他基于单词或句子的提取摘要模型。提取和抽象总结的最佳模型ROUGE-1、ROUGE-2和ROUGE-L分别为0.567、0.241和0.461;分别为0.530、0.213、0.394。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信