一种基于预训练的Bert和深度神经网络的混合模型，具有丰富的特征，用于抽取文本摘要

Journal of Computer Science and Cybernetics Pub Date : 2021-05-31 DOI:10.15625/1813-9663/37/2/15980

Tuan Minh Luu, H. T. Le, T. Hoang

{"title":"一种基于预训练的Bert和深度神经网络的混合模型，具有丰富的特征，用于抽取文本摘要","authors":"Tuan Minh Luu, H. T. Le, T. Hoang","doi":"10.15625/1813-9663/37/2/15980","DOIUrl":null,"url":null,"abstract":"Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A HYBRID MODEL USING THE PRETRAINED BERT AND DEEP NEURAL NETWORKS WITH RICH FEATURE FOR EXTRACTIVE TEXT SUMMARIZATION\",\"authors\":\"Tuan Minh Luu, H. T. Le, T. Hoang\",\"doi\":\"10.15625/1813-9663/37/2/15980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.\",\"PeriodicalId\":15444,\"journal\":{\"name\":\"Journal of Computer Science and Cybernetics\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Science and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15625/1813-9663/37/2/15980\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15625/1813-9663/37/2/15980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

深度神经网络已经成功地应用于大型训练数据集的文本摘要提取任务。然而，当训练数据集不够大时，这些模型会显示出某些局限性，从而影响系统总结的质量。本文提出了一种基于卷积神经网络和全连接网络的句子抽取摘要系统。使用预训练的BERT多语言模型从输入文本生成嵌入向量。这些向量与TF-IDF值相结合，产生文本摘要系统的输入。利用最大边际相关性方法消除输出摘要中的冗余句子。我们的系统分别使用CNN和Baomoi数据集使用英语和越南语进行评估。实验结果表明，与使用相同数据集的现有工作相比，我们的系统取得了更好的效果。这证实了我们的方法可以有效地应用于英语和越南语的总结。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A HYBRID MODEL USING THE PRETRAINED BERT AND DEEP NEURAL NETWORKS WITH RICH FEATURE FOR EXTRACTIVE TEXT SUMMARIZATION

Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computer Science and Cybernetics

自引率

0.00%

发文量