A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2

2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC) Pub Date : 2020-07-01 DOI:10.1109/ICEIEC49280.2020.9152352

Yuanbin Qu, Peihan Liu, Wei Song, Lizhen Liu, Miaomiao Cheng

{"title":"A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2","authors":"Yuanbin Qu, Peihan Liu, Wei Song, Lizhen Liu, Miaomiao Cheng","doi":"10.1109/ICEIEC49280.2020.9152352","DOIUrl":null,"url":null,"abstract":"Using a given starting word to make a sentence or filling in sentences is an important direction of natural language processing. From one aspect, it reflects whether the machine can have human thinking and creativity. We train the machine for specific tasks and then use it in natural language processing, which will help solve some sentence generation problems, especially for application scenarios such as summary generation, machine translation, and automatic question answering. The OpenAI GPT-2 and BERT models are currently widely used language models for text generation and prediction. There have been many experiments to verify the outstanding performance of these two models in the field of text generation. This paper will use two new corpora to train OpenAI GPT-2 model, used to generate long sentences and articles, and finally perform a comparative analysis. At the same time, we will use the BERT model to complete the task of predicting intermediate words based on the context.","PeriodicalId":352285,"journal":{"name":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIEC49280.2020.9152352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Using a given starting word to make a sentence or filling in sentences is an important direction of natural language processing. From one aspect, it reflects whether the machine can have human thinking and creativity. We train the machine for specific tasks and then use it in natural language processing, which will help solve some sentence generation problems, especially for application scenarios such as summary generation, machine translation, and automatic question answering. The OpenAI GPT-2 and BERT models are currently widely used language models for text generation and prediction. There have been many experiments to verify the outstanding performance of these two models in the field of text generation. This paper will use two new corpora to train OpenAI GPT-2 model, used to generate long sentences and articles, and finally perform a comparative analysis. At the same time, we will use the BERT model to complete the task of predicting intermediate words based on the context.

查看原文本刊更多论文

文本生成与预测系统:基于BERT和GPT-2的新语料库预训练

使用给定的起始词造句或补句是自然语言处理的一个重要方向。从一个方面反映了机器是否具有人的思维和创造力。我们对机器进行特定任务的训练，然后将其用于自然语言处理，这将有助于解决一些句子生成问题，特别是对于摘要生成、机器翻译和自动问答等应用场景。OpenAI GPT-2和BERT模型是目前广泛用于文本生成和预测的语言模型。已有大量实验验证了这两种模型在文本生成领域的优异性能。本文将使用两个新的语料库对OpenAI GPT-2模型进行训练，用于生成长句和文章，最后进行对比分析。同时，我们将使用BERT模型来完成基于上下文的中间词预测任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)

自引率

0.00%

发文量