解密 ChatGPT：深入了解 OpenAI 的健壮大型语言模型

IF 12.1 2区工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Archives of Computational Methods in Engineering Pub Date : 2024-06-18 DOI:10.1007/s11831-024-10115-5

Pronaya Bhattacharya, Vivek Kumar Prasad, Ashwin Verma, Deepak Gupta, Assadaporn Sapsomboon, Wattana Viriyasitavat, Gaurav Dhiman

{"title":"解密 ChatGPT：深入了解 OpenAI 的健壮大型语言模型","authors":"Pronaya Bhattacharya, Vivek Kumar Prasad, Ashwin Verma, Deepak Gupta, Assadaporn Sapsomboon, Wattana Viriyasitavat, Gaurav Dhiman","doi":"10.1007/s11831-024-10115-5","DOIUrl":null,"url":null,"abstract":"<div><p>Recent advancements in natural language processing (NLP) have catalyzed the development of models capable of generating coherent and contextually relevant responses. Such models are applied across a diverse array of applications, including but not limited to chatbots, expert systems, question-and-answer robots, and language translation systems. Large Language Models (LLMs), exemplified by OpenAI’s Generative Pretrained Transformer (GPT), have significantly transformed the NLP landscape. They have introduced unparalleled abilities in generating text that is not only contextually appropriate but also semantically rich. This evolution underscores a pivotal shift towards more sophisticated and intuitive language understanding and generation capabilities within the field. Models based on GPT are developed through extensive training on vast datasets, enabling them to grasp patterns akin to human writing styles and deliver insightful responses to intricate questions. These models excel in condensing text, extending incomplete passages, crafting imaginative narratives, and emulating conversational exchanges. However, GPT LLMs are not without their challenges, including ethical dilemmas and the propensity for disseminating misinformation. Additionally, the deployment of these models at a practical scale necessitates a substantial investment in training and computational resources, leading to concerns regarding their sustainability. ChatGPT, a variant rooted in transformer-based architectures, leverages a self-attention mechanism for data sequences and a reinforcement learning-based human feedback (RLHF) system. This enables the model to grasp long-range dependencies, facilitating the generation of contextually appropriate outputs. Despite ChatGPT marking a significant leap forward in NLP technology, there remains a lack of comprehensive discourse on its architecture, efficacy, and inherent constraints. Therefore, this survey aims to elucidate the ChatGPT model, offering an in-depth exploration of its foundational structure and operational efficacy. We meticulously examine Chat-GPT’s architecture and training methodology, alongside a critical analysis of its capabilities in language generation. Our investigation reveals ChatGPT’s remarkable aptitude for producing text indistinguishable from human writing, whilst also acknowledging its limitations and susceptibilities to bias. This analysis is intended to provide a clearer understanding of ChatGPT, fostering a nuanced appreciation of its contributions and challenges within the broader NLP field. We also explore the ethical and societal implications of this technology, and discuss the future of NLP and AI. Our study provides valuable insights into the inner workings of ChatGPT, and helps to shed light on the potential of LLMs for shaping the future of technology and society. The approach used as Eco-GPT, with a three-level cascade (GPT-J, J1-G, GPT-4), achieves 73% and 60% cost savings in CaseHold and CoQA datasets, outperforming GPT-4.</p></div>","PeriodicalId":55473,"journal":{"name":"Archives of Computational Methods in Engineering","volume":"31 8","pages":"4557 - 4600"},"PeriodicalIF":12.1000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Demystifying ChatGPT: An In-depth Survey of OpenAI’s Robust Large Language Models\",\"authors\":\"Pronaya Bhattacharya, Vivek Kumar Prasad, Ashwin Verma, Deepak Gupta, Assadaporn Sapsomboon, Wattana Viriyasitavat, Gaurav Dhiman\",\"doi\":\"10.1007/s11831-024-10115-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Recent advancements in natural language processing (NLP) have catalyzed the development of models capable of generating coherent and contextually relevant responses. Such models are applied across a diverse array of applications, including but not limited to chatbots, expert systems, question-and-answer robots, and language translation systems. Large Language Models (LLMs), exemplified by OpenAI’s Generative Pretrained Transformer (GPT), have significantly transformed the NLP landscape. They have introduced unparalleled abilities in generating text that is not only contextually appropriate but also semantically rich. This evolution underscores a pivotal shift towards more sophisticated and intuitive language understanding and generation capabilities within the field. Models based on GPT are developed through extensive training on vast datasets, enabling them to grasp patterns akin to human writing styles and deliver insightful responses to intricate questions. These models excel in condensing text, extending incomplete passages, crafting imaginative narratives, and emulating conversational exchanges. However, GPT LLMs are not without their challenges, including ethical dilemmas and the propensity for disseminating misinformation. Additionally, the deployment of these models at a practical scale necessitates a substantial investment in training and computational resources, leading to concerns regarding their sustainability. ChatGPT, a variant rooted in transformer-based architectures, leverages a self-attention mechanism for data sequences and a reinforcement learning-based human feedback (RLHF) system. This enables the model to grasp long-range dependencies, facilitating the generation of contextually appropriate outputs. Despite ChatGPT marking a significant leap forward in NLP technology, there remains a lack of comprehensive discourse on its architecture, efficacy, and inherent constraints. Therefore, this survey aims to elucidate the ChatGPT model, offering an in-depth exploration of its foundational structure and operational efficacy. We meticulously examine Chat-GPT’s architecture and training methodology, alongside a critical analysis of its capabilities in language generation. Our investigation reveals ChatGPT’s remarkable aptitude for producing text indistinguishable from human writing, whilst also acknowledging its limitations and susceptibilities to bias. This analysis is intended to provide a clearer understanding of ChatGPT, fostering a nuanced appreciation of its contributions and challenges within the broader NLP field. We also explore the ethical and societal implications of this technology, and discuss the future of NLP and AI. Our study provides valuable insights into the inner workings of ChatGPT, and helps to shed light on the potential of LLMs for shaping the future of technology and society. The approach used as Eco-GPT, with a three-level cascade (GPT-J, J1-G, GPT-4), achieves 73% and 60% cost savings in CaseHold and CoQA datasets, outperforming GPT-4.</p></div>\",\"PeriodicalId\":55473,\"journal\":{\"name\":\"Archives of Computational Methods in Engineering\",\"volume\":\"31 8\",\"pages\":\"4557 - 4600\"},\"PeriodicalIF\":12.1000,\"publicationDate\":\"2024-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Archives of Computational Methods in Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11831-024-10115-5\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of Computational Methods in Engineering","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1007/s11831-024-10115-5","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

自然语言处理（NLP）领域的最新进展推动了能够生成连贯且与上下文相关的回复的模型的发展。这些模型被广泛应用于聊天机器人、专家系统、问答机器人和语言翻译系统等领域。以 OpenAI 的生成预训练转换器（GPT）为代表的大型语言模型（LLM）极大地改变了 NLP 的格局。它们在生成不仅符合上下文而且语义丰富的文本方面具有无与伦比的能力。这一演变凸显了这一领域正在向更复杂、更直观的语言理解和生成能力转变。基于 GPT 的模型是通过在大量数据集上进行广泛训练而开发出来的，使它们能够掌握类似人类写作风格的模式，并对复杂的问题做出有见地的回答。这些模型擅长浓缩文本、扩展不完整的段落、编写富有想象力的叙述以及模仿对话交流。不过，GPT LLM 也并非没有挑战，包括道德困境和传播错误信息的倾向。此外，要在实际规模上部署这些模型，必须在培训和计算资源上投入大量资金，这也导致了人们对其可持续性的担忧。ChatGPT 是基于变压器架构的变体，它利用了数据序列的自我关注机制和基于强化学习的人类反馈（RLHF）系统。这使得该模型能够把握长程依赖关系，从而有助于生成与上下文相适应的输出。尽管 ChatGPT 标志着 NLP 技术的重大飞跃，但关于其架构、功效和内在限制的全面论述仍然缺乏。因此，本调查旨在阐明 ChatGPT 模型，深入探讨其基础结构和运行功效。我们仔细研究了 ChatGPT 的架构和训练方法，并对其语言生成能力进行了批判性分析。我们的研究揭示了 ChatGPT 在生成与人类写作无异的文本方面的卓越能力，同时也承认了它的局限性和易受偏见影响的问题。本分析旨在提供对 ChatGPT 的更清晰的理解，促进对其在更广泛的 NLP 领域中的贡献和挑战的细致入微的认识。我们还探讨了这项技术的伦理和社会影响，并讨论了 NLP 和人工智能的未来。我们的研究为了解 ChatGPT 的内部运作提供了宝贵的见解，并有助于阐明 LLM 在塑造未来技术和社会方面的潜力。采用三级级联（GPT-J、J1-G、GPT-4）的 Eco-GPT 方法在 CaseHold 和 CoQA 数据集中分别节省了 73% 和 60% 的成本，表现优于 GPT-4。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Demystifying ChatGPT: An In-depth Survey of OpenAI’s Robust Large Language Models

查看原文本刊更多论文

Demystifying ChatGPT: An In-depth Survey of OpenAI’s Robust Large Language Models

Recent advancements in natural language processing (NLP) have catalyzed the development of models capable of generating coherent and contextually relevant responses. Such models are applied across a diverse array of applications, including but not limited to chatbots, expert systems, question-and-answer robots, and language translation systems. Large Language Models (LLMs), exemplified by OpenAI’s Generative Pretrained Transformer (GPT), have significantly transformed the NLP landscape. They have introduced unparalleled abilities in generating text that is not only contextually appropriate but also semantically rich. This evolution underscores a pivotal shift towards more sophisticated and intuitive language understanding and generation capabilities within the field. Models based on GPT are developed through extensive training on vast datasets, enabling them to grasp patterns akin to human writing styles and deliver insightful responses to intricate questions. These models excel in condensing text, extending incomplete passages, crafting imaginative narratives, and emulating conversational exchanges. However, GPT LLMs are not without their challenges, including ethical dilemmas and the propensity for disseminating misinformation. Additionally, the deployment of these models at a practical scale necessitates a substantial investment in training and computational resources, leading to concerns regarding their sustainability. ChatGPT, a variant rooted in transformer-based architectures, leverages a self-attention mechanism for data sequences and a reinforcement learning-based human feedback (RLHF) system. This enables the model to grasp long-range dependencies, facilitating the generation of contextually appropriate outputs. Despite ChatGPT marking a significant leap forward in NLP technology, there remains a lack of comprehensive discourse on its architecture, efficacy, and inherent constraints. Therefore, this survey aims to elucidate the ChatGPT model, offering an in-depth exploration of its foundational structure and operational efficacy. We meticulously examine Chat-GPT’s architecture and training methodology, alongside a critical analysis of its capabilities in language generation. Our investigation reveals ChatGPT’s remarkable aptitude for producing text indistinguishable from human writing, whilst also acknowledging its limitations and susceptibilities to bias. This analysis is intended to provide a clearer understanding of ChatGPT, fostering a nuanced appreciation of its contributions and challenges within the broader NLP field. We also explore the ethical and societal implications of this technology, and discuss the future of NLP and AI. Our study provides valuable insights into the inner workings of ChatGPT, and helps to shed light on the potential of LLMs for shaping the future of technology and society. The approach used as Eco-GPT, with a three-level cascade (GPT-J, J1-G, GPT-4), achieves 73% and 60% cost savings in CaseHold and CoQA datasets, outperforming GPT-4.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Archives of Computational Methods in Engineering 工程技术-工程：综合

CiteScore

19.80

自引率

4.10%

发文量

153

审稿时长

>12 weeks

期刊介绍： Archives of Computational Methods in Engineering Aim and Scope: Archives of Computational Methods in Engineering serves as an active forum for disseminating research and advanced practices in computational engineering, particularly focusing on mechanics and related fields. The journal emphasizes extended state-of-the-art reviews in selected areas, a unique feature of its publication. Review Format: Reviews published in the journal offer: A survey of current literature Critical exposition of topics in their full complexity By organizing the information in this manner, readers can quickly grasp the focus, coverage, and unique features of the Archives of Computational Methods in Engineering.