基于深度学习的文本自动摘要技术研究与应用

Zekai Sun, Xiangru Meng, PiChao Zheng, Xiangning Zhu, Lei Yang
{"title":"基于深度学习的文本自动摘要技术研究与应用","authors":"Zekai Sun, Xiangru Meng, PiChao Zheng, Xiangning Zhu, Lei Yang","doi":"10.1109/ICTech55460.2022.00052","DOIUrl":null,"url":null,"abstract":"It takes a lot of time and energy for users to obtain useful information from the massive data generated by the Internet. The text abstract is a refined expression of the content of the article, which can summarize the main content of the article. Text summarization technology can quickly allow users to obtain information that is valuable to them, and to a certain extent alleviate the problem of information overload in the era of big data. In this paper, we use the knowledge enhancement model to learn the semantic relationship of the real world by modeling the entity concept and other prior semantic knowledge in massive data, so as to overcome the disadvantage of using only the original language signal in the previous language model. Then the generative pre-training model is used to solve some specific problems in natural language generation, such as the exposure bias problem. The experimental results show that the model used in this paper works well on the Gigaword and CNN / DailyMail data sets. At the same time, the abstract generated on the nlpcc2017 Chinese abstract data has good accuracy and readability.","PeriodicalId":290836,"journal":{"name":"2022 11th International Conference of Information and Communication Technology (ICTech))","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Research and Application of Automatic Text Summarization Technology Based on Deep Learning\",\"authors\":\"Zekai Sun, Xiangru Meng, PiChao Zheng, Xiangning Zhu, Lei Yang\",\"doi\":\"10.1109/ICTech55460.2022.00052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It takes a lot of time and energy for users to obtain useful information from the massive data generated by the Internet. The text abstract is a refined expression of the content of the article, which can summarize the main content of the article. Text summarization technology can quickly allow users to obtain information that is valuable to them, and to a certain extent alleviate the problem of information overload in the era of big data. In this paper, we use the knowledge enhancement model to learn the semantic relationship of the real world by modeling the entity concept and other prior semantic knowledge in massive data, so as to overcome the disadvantage of using only the original language signal in the previous language model. Then the generative pre-training model is used to solve some specific problems in natural language generation, such as the exposure bias problem. The experimental results show that the model used in this paper works well on the Gigaword and CNN / DailyMail data sets. At the same time, the abstract generated on the nlpcc2017 Chinese abstract data has good accuracy and readability.\",\"PeriodicalId\":290836,\"journal\":{\"name\":\"2022 11th International Conference of Information and Communication Technology (ICTech))\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 11th International Conference of Information and Communication Technology (ICTech))\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTech55460.2022.00052\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference of Information and Communication Technology (ICTech))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTech55460.2022.00052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

用户要从互联网产生的海量数据中获取有用的信息,需要耗费大量的时间和精力。文本摘要是文章内容的精细化表达,可以概括文章的主要内容。文本摘要技术可以让用户快速获取对自己有价值的信息,在一定程度上缓解大数据时代信息过载的问题。本文采用知识增强模型,通过对海量数据中的实体概念等先验语义知识进行建模,学习真实世界的语义关系,从而克服了以往语言模型只使用原始语言信号的缺点。然后利用生成式预训练模型解决自然语言生成中的一些具体问题,如暴露偏差问题。实验结果表明,本文使用的模型在Gigaword和CNN / DailyMail数据集上都能很好地工作。同时,在nlpcc2017中文摘要数据上生成的摘要具有良好的准确性和可读性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research and Application of Automatic Text Summarization Technology Based on Deep Learning
It takes a lot of time and energy for users to obtain useful information from the massive data generated by the Internet. The text abstract is a refined expression of the content of the article, which can summarize the main content of the article. Text summarization technology can quickly allow users to obtain information that is valuable to them, and to a certain extent alleviate the problem of information overload in the era of big data. In this paper, we use the knowledge enhancement model to learn the semantic relationship of the real world by modeling the entity concept and other prior semantic knowledge in massive data, so as to overcome the disadvantage of using only the original language signal in the previous language model. Then the generative pre-training model is used to solve some specific problems in natural language generation, such as the exposure bias problem. The experimental results show that the model used in this paper works well on the Gigaword and CNN / DailyMail data sets. At the same time, the abstract generated on the nlpcc2017 Chinese abstract data has good accuracy and readability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信