真正的定制还是纯粹的营销？生成式人工智能的定制版本有用吗？

Q2 Pharmacology, Toxicology and Pharmaceutics

F1000Research Pub Date : 2024-10-17 eCollection Date: 2024-01-01 DOI:10.12688/f1000research.153129.2

Eduardo C Garrido-Merchán, Jose Luis Arroyo-Barrigüete, Francisco Borrás-Pala, Leandro Escobar-Torres, Carlos Martínez de Ibarreta, Jose María Ortíz-Lozano, Antonio Rua-Vieites

{"title":"真正的定制还是纯粹的营销？生成式人工智能的定制版本有用吗？","authors":"Eduardo C Garrido-Merchán, Jose Luis Arroyo-Barrigüete, Francisco Borrás-Pala, Leandro Escobar-Torres, Carlos Martínez de Ibarreta, Jose María Ortíz-Lozano, Antonio Rua-Vieites","doi":"10.12688/f1000research.153129.2","DOIUrl":null,"url":null,"abstract":"Background: Large Language Models (LLMs), as in the case of OpenAI TM ChatGPT-4 TM Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalised through customization process to meet the student demands on every particular subject, like statistics. Recently, OpenAI launched the possibility of customizing their model with a natural language web interface, enabling the creation of customised GPT versions deliberately conditioned to meet the demands of a specific task.Methods: This preliminary research aims to assess the potential of the customised GPTs. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behaviour was evaluated and compared with that of ChatGPT-4 Turbo. Firstly, each professor collected 15-30 genuine student questions from \"Statistics and Probability\" and \"Business Statistics\" courses across seven degrees, primarily from second-year courses. These questions, often ambiguous and imprecise, were posed to ChatGPT-4 Turbo and BSVP, with their initial responses recorded without follow-ups. In the third stage, professors blindly evaluated the responses on a 0-10 scale, considering quality, depth, and personalization. Finally, a statistical comparison of the systems' performance was conducted.Results: The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP responded in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, when explicitly asked for something like, \"I would like to practice a programming exercise similar to those in R practice 4,\" BSVP could provide a far superior response. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo.Conclusions: It appears that customised assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.","PeriodicalId":12260,"journal":{"name":"F1000Research","volume":"13 ","pages":"791"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447677/pdf/","citationCount":"0","resultStr":"{\"title\":\"Real Customization or Just Marketing: Are Customized Versions of Generative AI Useful?\",\"authors\":\"Eduardo C Garrido-Merchán, Jose Luis Arroyo-Barrigüete, Francisco Borrás-Pala, Leandro Escobar-Torres, Carlos Martínez de Ibarreta, Jose María Ortíz-Lozano, Antonio Rua-Vieites\",\"doi\":\"10.12688/f1000research.153129.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Large Language Models (LLMs), as in the case of OpenAI TM ChatGPT-4 TM Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalised through customization process to meet the student demands on every particular subject, like statistics. Recently, OpenAI launched the possibility of customizing their model with a natural language web interface, enabling the creation of customised GPT versions deliberately conditioned to meet the demands of a specific task.Methods: This preliminary research aims to assess the potential of the customised GPTs. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behaviour was evaluated and compared with that of ChatGPT-4 Turbo. Firstly, each professor collected 15-30 genuine student questions from \\\"Statistics and Probability\\\" and \\\"Business Statistics\\\" courses across seven degrees, primarily from second-year courses. These questions, often ambiguous and imprecise, were posed to ChatGPT-4 Turbo and BSVP, with their initial responses recorded without follow-ups. In the third stage, professors blindly evaluated the responses on a 0-10 scale, considering quality, depth, and personalization. Finally, a statistical comparison of the systems' performance was conducted.Results: The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP responded in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, when explicitly asked for something like, \\\"I would like to practice a programming exercise similar to those in R practice 4,\\\" BSVP could provide a far superior response. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo.Conclusions: It appears that customised assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.\",\"PeriodicalId\":12260,\"journal\":{\"name\":\"F1000Research\",\"volume\":\"13 \",\"pages\":\"791\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447677/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"F1000Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12688/f1000research.153129.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"Pharmacology, Toxicology and Pharmaceutics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"F1000Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/f1000research.153129.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Pharmacology, Toxicology and Pharmaceutics","Score":null,"Total":0}

引用次数: 0

摘要

背景：大型语言模型（LLM），如 OpenAI TM ChatGPT-4 TM Turbo，正在给包括高等教育在内的多个行业带来变革。在这种情况下，LLM 可以通过微调过程实现个性化，以满足学生对每个特定科目（如统计学）的需求。最近，OpenAI 推出了通过自然语言网络界面对其模型进行微调的可能性，从而能够创建特意定制的 GPT 版本，以满足特定任务的需求：本初步研究旨在评估定制 GPT 的潜力。在为科米亚斯主教大学的学生开发了商业统计虚拟教授（BSVP）后，对其行为进行了评估，并与 ChatGPT-4 Turbo 进行了比较。首先，每位教授从七个学位的 "统计与概率 "和 "商业统计 "课程中收集了 15-30 个真实的学生问题，主要来自二年级课程。这些问题往往模棱两可，不够精确，教授们将这些问题提交给 ChatGPT-4 Turbo 和 BSVP，并记录下他们的初步回答，而不作后续处理。在第三阶段，教授们按照 0-10 的评分标准对回答进行盲评，评判标准包括质量、深度和个性化。最后，对系统的性能进行了统计比较：结果：结果得出了几个结论。首先，我们观察到了交流方式的重大改变。BSVP 按照训练时所使用的指令，以更加亲切和友好的语气做出回应，甚至还加入了一些小笑话。其次，当 BSVP 被明确要求进行类似 "我想练习类似于 R 练习 4 中的编程练习 "这样的练习时，它能提供出色得多的回答。最后，在整体表现、质量、深度以及与课程具体内容的一致性方面，BSVP 和 ChatGPT-4 Turbo 的回答在统计学上没有显著差异：看来，经过提示训练的定制助手作为学生的虚拟辅助工具具有优势，但与 ChatGPT-4 Turbo 相比，它们并没有实质性的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Real Customization or Just Marketing: Are Customized Versions of Generative AI Useful?

Background: Large Language Models (LLMs), as in the case of OpenAI ^TM ChatGPT-4 ^TM Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalised through customization process to meet the student demands on every particular subject, like statistics. Recently, OpenAI launched the possibility of customizing their model with a natural language web interface, enabling the creation of customised GPT versions deliberately conditioned to meet the demands of a specific task.

Methods: This preliminary research aims to assess the potential of the customised GPTs. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behaviour was evaluated and compared with that of ChatGPT-4 Turbo. Firstly, each professor collected 15-30 genuine student questions from "Statistics and Probability" and "Business Statistics" courses across seven degrees, primarily from second-year courses. These questions, often ambiguous and imprecise, were posed to ChatGPT-4 Turbo and BSVP, with their initial responses recorded without follow-ups. In the third stage, professors blindly evaluated the responses on a 0-10 scale, considering quality, depth, and personalization. Finally, a statistical comparison of the systems' performance was conducted.

Results: The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP responded in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, when explicitly asked for something like, "I would like to practice a programming exercise similar to those in R practice 4," BSVP could provide a far superior response. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo.

Conclusions: It appears that customised assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

F1000Research Pharmacology, Toxicology and Pharmaceutics-Pharmacology, Toxicology and Pharmaceutics (all)

CiteScore

5.00

自引率

0.00%

发文量

1646

审稿时长

1 weeks

期刊介绍： F1000Research publishes articles and other research outputs reporting basic scientific, scholarly, translational and clinical research across the physical and life sciences, engineering, medicine, social sciences and humanities. F1000Research is a scholarly publication platform set up for the scientific, scholarly and medical research community; each article has at least one author who is a qualified researcher, scholar or clinician actively working in their speciality and who has made a key contribution to the article. Articles must be original (not duplications). All research is suitable irrespective of the perceived level of interest or novelty; we welcome confirmatory and negative results, as well as null studies. F1000Research publishes different type of research, including clinical trials, systematic reviews, software tools, method articles, and many others. Reviews and Opinion articles providing a balanced and comprehensive overview of the latest discoveries in a particular field, or presenting a personal perspective on recent developments, are also welcome. See the full list of article types we accept for more information.