A survey of GPT-3 family large language models including ChatGPT and GPT-4

Katikapalli Subramanyam Kalyan
{"title":"A survey of GPT-3 family large language models including ChatGPT and GPT-4","authors":"Katikapalli Subramanyam Kalyan","doi":"10.1016/j.nlp.2023.100048","DOIUrl":null,"url":null,"abstract":"<div><p>Large language models (LLMs) are a special class of pretrained language models (PLMs) obtained by scaling model size, pretraining corpus and computation. LLMs, because of their large size and pretraining on large volumes of text data, exhibit special abilities which allow them to achieve remarkable performances without any task-specific training in many of the natural language processing tasks. The era of LLMs started with OpenAI’s GPT-3 model, and the popularity of LLMs has increased exponentially after the introduction of models like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models, including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With the ever-rising popularity of GLLMs, especially in the research community, there is a strong need for a comprehensive survey which summarizes the recent research progress in multiple dimensions and can guide the research community with insightful future research directions. We start the survey paper with foundation concepts like transformers, transfer learning, self-supervised learning, pretrained language models and large language models. We then present a brief overview of GLLMs and discuss the performances of GLLMs in various downstream tasks, specific domains and multiple languages. We also discuss the data labelling and data augmentation abilities of GLLMs, the robustness of GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with multiple insightful future research directions. To summarize, this comprehensive survey paper will serve as a good resource for both academic and industry people to stay updated with the latest research related to GLLMs.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"6 ","pages":"Article 100048"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719123000456/pdfft?md5=72753bb0aac6b7c01d0dc8bddfb62121&pid=1-s2.0-S2949719123000456-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719123000456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) are a special class of pretrained language models (PLMs) obtained by scaling model size, pretraining corpus and computation. LLMs, because of their large size and pretraining on large volumes of text data, exhibit special abilities which allow them to achieve remarkable performances without any task-specific training in many of the natural language processing tasks. The era of LLMs started with OpenAI’s GPT-3 model, and the popularity of LLMs has increased exponentially after the introduction of models like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models, including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With the ever-rising popularity of GLLMs, especially in the research community, there is a strong need for a comprehensive survey which summarizes the recent research progress in multiple dimensions and can guide the research community with insightful future research directions. We start the survey paper with foundation concepts like transformers, transfer learning, self-supervised learning, pretrained language models and large language models. We then present a brief overview of GLLMs and discuss the performances of GLLMs in various downstream tasks, specific domains and multiple languages. We also discuss the data labelling and data augmentation abilities of GLLMs, the robustness of GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with multiple insightful future research directions. To summarize, this comprehensive survey paper will serve as a good resource for both academic and industry people to stay updated with the latest research related to GLLMs.

Abstract Image

包括 ChatGPT 和 GPT-4 在内的 GPT-3 系列大型语言模型概览
大型语言模型(LLMs)是一类特殊的预训练语言模型(PLMs),通过调整模型大小、预训练语料库和计算量而获得。LLMs 由于规模大,并在大量文本数据上进行了预训练,因此表现出了特殊的能力,在许多自然语言处理任务中,无需任何特定任务训练,就能取得出色的性能。LLM 的时代始于 OpenAI 的 GPT-3 模型,而在 ChatGPT 和 GPT4 等模型推出后,LLM 的受欢迎程度呈指数级增长。我们将 GPT-3 及其后续 OpenAI 模型(包括 ChatGPT 和 GPT4)称为 GPT-3 系列大型语言模型(GLLM)。随着 GLLM 的日益普及,尤其是在研究界的日益普及,我们亟需一份全面的调查报告,从多个维度总结近期的研究进展,并为研究界提供具有洞察力的未来研究方向。我们将从变换器、迁移学习、自监督学习、预训练语言模型和大型语言模型等基础概念开始本文的研究。然后,我们简要介绍了 GLLM,并讨论了 GLLM 在各种下游任务、特定领域和多种语言中的表现。我们还讨论了 GLLMs 的数据标注和数据增强能力、GLLMs 的鲁棒性、GLLMs 作为评估工具的有效性,最后,我们提出了多个具有洞察力的未来研究方向。总之,这篇全面的调查论文将成为学术界和业界人士了解 GLLM 相关最新研究的良好资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信