{"title":"A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2","authors":"Yuanbin Qu, Peihan Liu, Wei Song, Lizhen Liu, Miaomiao Cheng","doi":"10.1109/ICEIEC49280.2020.9152352","DOIUrl":null,"url":null,"abstract":"Using a given starting word to make a sentence or filling in sentences is an important direction of natural language processing. From one aspect, it reflects whether the machine can have human thinking and creativity. We train the machine for specific tasks and then use it in natural language processing, which will help solve some sentence generation problems, especially for application scenarios such as summary generation, machine translation, and automatic question answering. The OpenAI GPT-2 and BERT models are currently widely used language models for text generation and prediction. There have been many experiments to verify the outstanding performance of these two models in the field of text generation. This paper will use two new corpora to train OpenAI GPT-2 model, used to generate long sentences and articles, and finally perform a comparative analysis. At the same time, we will use the BERT model to complete the task of predicting intermediate words based on the context.","PeriodicalId":352285,"journal":{"name":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIEC49280.2020.9152352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Using a given starting word to make a sentence or filling in sentences is an important direction of natural language processing. From one aspect, it reflects whether the machine can have human thinking and creativity. We train the machine for specific tasks and then use it in natural language processing, which will help solve some sentence generation problems, especially for application scenarios such as summary generation, machine translation, and automatic question answering. The OpenAI GPT-2 and BERT models are currently widely used language models for text generation and prediction. There have been many experiments to verify the outstanding performance of these two models in the field of text generation. This paper will use two new corpora to train OpenAI GPT-2 model, used to generate long sentences and articles, and finally perform a comparative analysis. At the same time, we will use the BERT model to complete the task of predicting intermediate words based on the context.