A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2

Yuanbin Qu, Peihan Liu, Wei Song, Lizhen Liu, Miaomiao Cheng
{"title":"A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2","authors":"Yuanbin Qu, Peihan Liu, Wei Song, Lizhen Liu, Miaomiao Cheng","doi":"10.1109/ICEIEC49280.2020.9152352","DOIUrl":null,"url":null,"abstract":"Using a given starting word to make a sentence or filling in sentences is an important direction of natural language processing. From one aspect, it reflects whether the machine can have human thinking and creativity. We train the machine for specific tasks and then use it in natural language processing, which will help solve some sentence generation problems, especially for application scenarios such as summary generation, machine translation, and automatic question answering. The OpenAI GPT-2 and BERT models are currently widely used language models for text generation and prediction. There have been many experiments to verify the outstanding performance of these two models in the field of text generation. This paper will use two new corpora to train OpenAI GPT-2 model, used to generate long sentences and articles, and finally perform a comparative analysis. At the same time, we will use the BERT model to complete the task of predicting intermediate words based on the context.","PeriodicalId":352285,"journal":{"name":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIEC49280.2020.9152352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Using a given starting word to make a sentence or filling in sentences is an important direction of natural language processing. From one aspect, it reflects whether the machine can have human thinking and creativity. We train the machine for specific tasks and then use it in natural language processing, which will help solve some sentence generation problems, especially for application scenarios such as summary generation, machine translation, and automatic question answering. The OpenAI GPT-2 and BERT models are currently widely used language models for text generation and prediction. There have been many experiments to verify the outstanding performance of these two models in the field of text generation. This paper will use two new corpora to train OpenAI GPT-2 model, used to generate long sentences and articles, and finally perform a comparative analysis. At the same time, we will use the BERT model to complete the task of predicting intermediate words based on the context.
文本生成与预测系统:基于BERT和GPT-2的新语料库预训练
使用给定的起始词造句或补句是自然语言处理的一个重要方向。从一个方面反映了机器是否具有人的思维和创造力。我们对机器进行特定任务的训练,然后将其用于自然语言处理,这将有助于解决一些句子生成问题,特别是对于摘要生成、机器翻译和自动问答等应用场景。OpenAI GPT-2和BERT模型是目前广泛用于文本生成和预测的语言模型。已有大量实验验证了这两种模型在文本生成领域的优异性能。本文将使用两个新的语料库对OpenAI GPT-2模型进行训练,用于生成长句和文章,最后进行对比分析。同时,我们将使用BERT模型来完成基于上下文的中间词预测任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信