Eduardo C Garrido-Merchán, Jose Luis Arroyo-Barrigüete, Francisco Borrás-Pala, Leandro Escobar-Torres, Carlos Martínez de Ibarreta, Jose María Ortíz-Lozano, Antonio Rua-Vieites
{"title":"真正的定制还是纯粹的营销?生成式人工智能的定制版本有用吗?","authors":"Eduardo C Garrido-Merchán, Jose Luis Arroyo-Barrigüete, Francisco Borrás-Pala, Leandro Escobar-Torres, Carlos Martínez de Ibarreta, Jose María Ortíz-Lozano, Antonio Rua-Vieites","doi":"10.12688/f1000research.153129.2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large Language Models (LLMs), as in the case of OpenAI <sup>TM</sup> ChatGPT-4 <sup>TM</sup> Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalised through customization process to meet the student demands on every particular subject, like statistics. Recently, OpenAI launched the possibility of customizing their model with a natural language web interface, enabling the creation of customised GPT versions deliberately conditioned to meet the demands of a specific task.</p><p><strong>Methods: </strong>This preliminary research aims to assess the potential of the customised GPTs. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behaviour was evaluated and compared with that of ChatGPT-4 Turbo. Firstly, each professor collected 15-30 genuine student questions from \"Statistics and Probability\" and \"Business Statistics\" courses across seven degrees, primarily from second-year courses. These questions, often ambiguous and imprecise, were posed to ChatGPT-4 Turbo and BSVP, with their initial responses recorded without follow-ups. In the third stage, professors blindly evaluated the responses on a 0-10 scale, considering quality, depth, and personalization. Finally, a statistical comparison of the systems' performance was conducted.</p><p><strong>Results: </strong>The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP responded in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, when explicitly asked for something like, \"I would like to practice a programming exercise similar to those in R practice 4,\" BSVP could provide a far superior response. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo.</p><p><strong>Conclusions: </strong>It appears that customised assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.</p>","PeriodicalId":12260,"journal":{"name":"F1000Research","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447677/pdf/","citationCount":"0","resultStr":"{\"title\":\"Real Customization or Just Marketing: Are Customized Versions of Generative AI Useful?\",\"authors\":\"Eduardo C Garrido-Merchán, Jose Luis Arroyo-Barrigüete, Francisco Borrás-Pala, Leandro Escobar-Torres, Carlos Martínez de Ibarreta, Jose María Ortíz-Lozano, Antonio Rua-Vieites\",\"doi\":\"10.12688/f1000research.153129.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Large Language Models (LLMs), as in the case of OpenAI <sup>TM</sup> ChatGPT-4 <sup>TM</sup> Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalised through customization process to meet the student demands on every particular subject, like statistics. Recently, OpenAI launched the possibility of customizing their model with a natural language web interface, enabling the creation of customised GPT versions deliberately conditioned to meet the demands of a specific task.</p><p><strong>Methods: </strong>This preliminary research aims to assess the potential of the customised GPTs. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behaviour was evaluated and compared with that of ChatGPT-4 Turbo. Firstly, each professor collected 15-30 genuine student questions from \\\"Statistics and Probability\\\" and \\\"Business Statistics\\\" courses across seven degrees, primarily from second-year courses. These questions, often ambiguous and imprecise, were posed to ChatGPT-4 Turbo and BSVP, with their initial responses recorded without follow-ups. In the third stage, professors blindly evaluated the responses on a 0-10 scale, considering quality, depth, and personalization. Finally, a statistical comparison of the systems' performance was conducted.</p><p><strong>Results: </strong>The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP responded in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, when explicitly asked for something like, \\\"I would like to practice a programming exercise similar to those in R practice 4,\\\" BSVP could provide a far superior response. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo.</p><p><strong>Conclusions: </strong>It appears that customised assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.</p>\",\"PeriodicalId\":12260,\"journal\":{\"name\":\"F1000Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447677/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"F1000Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12688/f1000research.153129.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"Pharmacology, Toxicology and Pharmaceutics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"F1000Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12688/f1000research.153129.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Pharmacology, Toxicology and Pharmaceutics","Score":null,"Total":0}
Real Customization or Just Marketing: Are Customized Versions of Generative AI Useful?
Background: Large Language Models (LLMs), as in the case of OpenAI TM ChatGPT-4 TM Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalised through customization process to meet the student demands on every particular subject, like statistics. Recently, OpenAI launched the possibility of customizing their model with a natural language web interface, enabling the creation of customised GPT versions deliberately conditioned to meet the demands of a specific task.
Methods: This preliminary research aims to assess the potential of the customised GPTs. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behaviour was evaluated and compared with that of ChatGPT-4 Turbo. Firstly, each professor collected 15-30 genuine student questions from "Statistics and Probability" and "Business Statistics" courses across seven degrees, primarily from second-year courses. These questions, often ambiguous and imprecise, were posed to ChatGPT-4 Turbo and BSVP, with their initial responses recorded without follow-ups. In the third stage, professors blindly evaluated the responses on a 0-10 scale, considering quality, depth, and personalization. Finally, a statistical comparison of the systems' performance was conducted.
Results: The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP responded in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, when explicitly asked for something like, "I would like to practice a programming exercise similar to those in R practice 4," BSVP could provide a far superior response. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo.
Conclusions: It appears that customised assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.
F1000ResearchPharmacology, Toxicology and Pharmaceutics-Pharmacology, Toxicology and Pharmaceutics (all)
CiteScore
5.00
自引率
0.00%
发文量
1646
审稿时长
1 weeks
期刊介绍:
F1000Research publishes articles and other research outputs reporting basic scientific, scholarly, translational and clinical research across the physical and life sciences, engineering, medicine, social sciences and humanities. F1000Research is a scholarly publication platform set up for the scientific, scholarly and medical research community; each article has at least one author who is a qualified researcher, scholar or clinician actively working in their speciality and who has made a key contribution to the article. Articles must be original (not duplications). All research is suitable irrespective of the perceived level of interest or novelty; we welcome confirmatory and negative results, as well as null studies. F1000Research publishes different type of research, including clinical trials, systematic reviews, software tools, method articles, and many others. Reviews and Opinion articles providing a balanced and comprehensive overview of the latest discoveries in a particular field, or presenting a personal perspective on recent developments, are also welcome. See the full list of article types we accept for more information.