基于深度学习的文本自动摘要技术研究与应用

2022 11th International Conference of Information and Communication Technology (ICTech)) Pub Date : 2022-02-01 DOI:10.1109/ICTech55460.2022.00052

Zekai Sun, Xiangru Meng, PiChao Zheng, Xiangning Zhu, Lei Yang

{"title":"基于深度学习的文本自动摘要技术研究与应用","authors":"Zekai Sun, Xiangru Meng, PiChao Zheng, Xiangning Zhu, Lei Yang","doi":"10.1109/ICTech55460.2022.00052","DOIUrl":null,"url":null,"abstract":"It takes a lot of time and energy for users to obtain useful information from the massive data generated by the Internet. The text abstract is a refined expression of the content of the article, which can summarize the main content of the article. Text summarization technology can quickly allow users to obtain information that is valuable to them, and to a certain extent alleviate the problem of information overload in the era of big data. In this paper, we use the knowledge enhancement model to learn the semantic relationship of the real world by modeling the entity concept and other prior semantic knowledge in massive data, so as to overcome the disadvantage of using only the original language signal in the previous language model. Then the generative pre-training model is used to solve some specific problems in natural language generation, such as the exposure bias problem. The experimental results show that the model used in this paper works well on the Gigaword and CNN / DailyMail data sets. At the same time, the abstract generated on the nlpcc2017 Chinese abstract data has good accuracy and readability.","PeriodicalId":290836,"journal":{"name":"2022 11th International Conference of Information and Communication Technology (ICTech))","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Research and Application of Automatic Text Summarization Technology Based on Deep Learning\",\"authors\":\"Zekai Sun, Xiangru Meng, PiChao Zheng, Xiangning Zhu, Lei Yang\",\"doi\":\"10.1109/ICTech55460.2022.00052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It takes a lot of time and energy for users to obtain useful information from the massive data generated by the Internet. The text abstract is a refined expression of the content of the article, which can summarize the main content of the article. Text summarization technology can quickly allow users to obtain information that is valuable to them, and to a certain extent alleviate the problem of information overload in the era of big data. In this paper, we use the knowledge enhancement model to learn the semantic relationship of the real world by modeling the entity concept and other prior semantic knowledge in massive data, so as to overcome the disadvantage of using only the original language signal in the previous language model. Then the generative pre-training model is used to solve some specific problems in natural language generation, such as the exposure bias problem. The experimental results show that the model used in this paper works well on the Gigaword and CNN / DailyMail data sets. At the same time, the abstract generated on the nlpcc2017 Chinese abstract data has good accuracy and readability.\",\"PeriodicalId\":290836,\"journal\":{\"name\":\"2022 11th International Conference of Information and Communication Technology (ICTech))\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 11th International Conference of Information and Communication Technology (ICTech))\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTech55460.2022.00052\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference of Information and Communication Technology (ICTech))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTech55460.2022.00052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

用户要从互联网产生的海量数据中获取有用的信息，需要耗费大量的时间和精力。文本摘要是文章内容的精细化表达，可以概括文章的主要内容。文本摘要技术可以让用户快速获取对自己有价值的信息，在一定程度上缓解大数据时代信息过载的问题。本文采用知识增强模型，通过对海量数据中的实体概念等先验语义知识进行建模，学习真实世界的语义关系，从而克服了以往语言模型只使用原始语言信号的缺点。然后利用生成式预训练模型解决自然语言生成中的一些具体问题，如暴露偏差问题。实验结果表明，本文使用的模型在Gigaword和CNN / DailyMail数据集上都能很好地工作。同时，在nlpcc2017中文摘要数据上生成的摘要具有良好的准确性和可读性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research and Application of Automatic Text Summarization Technology Based on Deep Learning

It takes a lot of time and energy for users to obtain useful information from the massive data generated by the Internet. The text abstract is a refined expression of the content of the article, which can summarize the main content of the article. Text summarization technology can quickly allow users to obtain information that is valuable to them, and to a certain extent alleviate the problem of information overload in the era of big data. In this paper, we use the knowledge enhancement model to learn the semantic relationship of the real world by modeling the entity concept and other prior semantic knowledge in massive data, so as to overcome the disadvantage of using only the original language signal in the previous language model. Then the generative pre-training model is used to solve some specific problems in natural language generation, such as the exposure bias problem. The experimental results show that the model used in this paper works well on the Gigaword and CNN / DailyMail data sets. At the same time, the abstract generated on the nlpcc2017 Chinese abstract data has good accuracy and readability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 11th International Conference of Information and Communication Technology (ICTech))

自引率

0.00%

发文量