Generative AI for scalable feedback to multimodal exercises

IF 7.5 2区管理学 Q1 BUSINESS

International Journal of Research in Marketing Pub Date : 2024-05-22 DOI:10.1016/j.ijresmar.2024.05.005

{"title":"Generative AI for scalable feedback to multimodal exercises","authors":"","doi":"10.1016/j.ijresmar.2024.05.005","DOIUrl":null,"url":null,"abstract":"<div><p>Detailed feedback on exercises helps learners become proficient but is time-consuming for educators and, thus, hardly scalable. This manuscript evaluates how well Generative Artificial Intelligence (AI) provides automated feedback on complex multimodal exercises requiring coding, statistics, and economic reasoning. Besides providing this technology through an easily accessible web application, this article evaluates the technology’s performance by comparing the quantitative feedback (i.e., points achieved) from Generative AI models with human expert feedback for 4,349 solutions to marketing analytics exercises. The results show that automated feedback produced by Generative AI (GPT-4) provides almost unbiased evaluations while correlating highly with (r = 0.94) and deviating only 6 % from human evaluations. GPT-4 performs best among seven Generative AI models, albeit at the highest cost. Comparing the models’ performance with costs shows that GPT-4, Mistral Large, Claude 3 Opus, and Gemini 1.0 Pro dominate three other Generative AI models (Claude 3 Sonnet, GPT-3.5, and Gemini 1.5 Pro). Expert assessment of the qualitative feedback (i.e., the AI’s textual response) indicates that it is mostly correct, sufficient, and appropriate for learners. A survey of marketing analytics learners shows that they highly recommend the app and its Generative AI feedback. An advantage of the app is its subject-agnosticism—it does not require any subject- or exercise-specific training. Thus, it is immediately usable for new exercises in marketing analytics and other subjects.</p></div>","PeriodicalId":48298,"journal":{"name":"International Journal of Research in Marketing","volume":"41 3","pages":"Pages 468-488"},"PeriodicalIF":7.5000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167811624000430/pdfft?md5=d14511d90c27f59a0f56bcf556127413&pid=1-s2.0-S0167811624000430-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Research in Marketing","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167811624000430","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BUSINESS","Score":null,"Total":0}

引用次数: 0

Abstract

Detailed feedback on exercises helps learners become proficient but is time-consuming for educators and, thus, hardly scalable. This manuscript evaluates how well Generative Artificial Intelligence (AI) provides automated feedback on complex multimodal exercises requiring coding, statistics, and economic reasoning. Besides providing this technology through an easily accessible web application, this article evaluates the technology’s performance by comparing the quantitative feedback (i.e., points achieved) from Generative AI models with human expert feedback for 4,349 solutions to marketing analytics exercises. The results show that automated feedback produced by Generative AI (GPT-4) provides almost unbiased evaluations while correlating highly with (r = 0.94) and deviating only 6 % from human evaluations. GPT-4 performs best among seven Generative AI models, albeit at the highest cost. Comparing the models’ performance with costs shows that GPT-4, Mistral Large, Claude 3 Opus, and Gemini 1.0 Pro dominate three other Generative AI models (Claude 3 Sonnet, GPT-3.5, and Gemini 1.5 Pro). Expert assessment of the qualitative feedback (i.e., the AI’s textual response) indicates that it is mostly correct, sufficient, and appropriate for learners. A survey of marketing analytics learners shows that they highly recommend the app and its Generative AI feedback. An advantage of the app is its subject-agnosticism—it does not require any subject- or exercise-specific training. Thus, it is immediately usable for new exercises in marketing analytics and other subjects.

查看原文本刊更多论文

为多模态练习提供可扩展反馈的生成式人工智能

对练习的详细反馈有助于学习者熟练掌握，但对教育者来说却非常耗时，因此很难推广。本手稿评估了生成式人工智能（AI）如何为需要编码、统计和经济推理的复杂多模态练习提供自动反馈。除了通过一个易于访问的网络应用程序提供这项技术外，本文还通过比较生成式人工智能模型和人类专家对 4349 个营销分析练习解决方案的定量反馈（即获得的分数）来评估该技术的性能。结果表明，生成式人工智能（GPT-4）产生的自动反馈提供了几乎无偏见的评价，同时与人类评价高度相关（r = 0.94），偏差仅为 6%。GPT-4 在七个生成式人工智能模型中表现最佳，尽管成本最高。将模型的性能与成本进行比较后发现，GPT-4、Mistral Large、Claude 3 Opus 和 Gemini 1.0 Pro 比其他三个生成式人工智能模型（Claude 3 Sonnet、GPT-3.5 和 Gemini 1.5 Pro）更胜一筹。专家对定性反馈（即人工智能的文字回复）的评估表明，这些反馈大多是正确、充分和适合学习者的。对市场分析学习者的调查显示，他们强烈推荐该应用程序及其生成式人工智能反馈。该应用程序的一个优点是不分学科，不需要任何针对特定学科或练习的培训。因此，它可以立即用于营销分析和其他科目的新练习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Research in Marketing BUSINESS-

CiteScore

11.80

自引率

4.30%

发文量

审稿时长

66 days

期刊介绍： The International Journal of Research in Marketing is an international, double-blind peer-reviewed journal for marketing academics and practitioners. Building on a great tradition of global marketing scholarship, IJRM aims to contribute substantially to the field of marketing research by providing a high-quality medium for the dissemination of new marketing knowledge and methods. Among IJRM targeted audience are marketing scholars, practitioners (e.g., marketing research and consulting professionals) and other interested groups and individuals.