使用强化学习将英语的中型GPT模型与西班牙语的小型封闭域对齐

Proces. del Leng. Natural Pub Date : 2023-03-30 DOI:10.48550/arXiv.2303.17649

Oscar R. Navarrete-Parra, Víctor Uc Cetina, Jorge Reyes-Magaña

{"title":"使用强化学习将英语的中型GPT模型与西班牙语的小型封闭域对齐","authors":"Oscar R. Navarrete-Parra, Víctor Uc Cetina, Jorge Reyes-Magaña","doi":"10.48550/arXiv.2303.17649","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"7 Suppl 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning\",\"authors\":\"Oscar R. Navarrete-Parra, Víctor Uc Cetina, Jorge Reyes-Magaña\",\"doi\":\"10.48550/arXiv.2303.17649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.\",\"PeriodicalId\":258781,\"journal\":{\"name\":\"Proces. del Leng. Natural\",\"volume\":\"7 Suppl 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proces. del Leng. Natural\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2303.17649\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proces. del Leng. Natural","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.17649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出了一种方法，将一个中型的GPT模型(最初用英语训练用于开放领域)与一个小型的西班牙语封闭领域对齐。对模型进行精细调优的应用程序是问答任务。为了实现这一点，我们还需要训练和实现另一个神经网络(我们称之为奖励模型)，它可以评分并确定答案是否适合给定的问题。该组件用于改进系统的解码和生成答案。采用BLEU和perplexity等数值指标对模型进行评价，并采用人的判断与其他解码技术进行比较。最后，结果支持所提出的方法，并确定使用奖励模型来调整响应的生成是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Aligning a medium-size GPT model in English to a small closed domain in Spanish using reinforcement learning

In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proces. del Leng. Natural

自引率

0.00%

发文量