Evaluation of Code Generation for Simulating Participant Behavior in Experience Sampling Method by Iterative In-Context Learning of a Large Language Model

Q1 Social Sciences

Proceedings of the ACM on Human-Computer Interaction Pub Date : 2024-06-17 DOI:10.1145/3661143

Alireza Khanshan, Pieter van Gorp, P. Markopoulos

{"title":"Evaluation of Code Generation for Simulating Participant Behavior in Experience Sampling Method by Iterative In-Context Learning of a Large Language Model","authors":"Alireza Khanshan, Pieter van Gorp, P. Markopoulos","doi":"10.1145/3661143","DOIUrl":null,"url":null,"abstract":"The Experience Sampling Method (ESM) is commonly used to understand behaviors, thoughts, and feelings in the wild by collecting self-reports. Sustaining sufficient response rates, especially in long-running studies remains challenging. To avoid low response rates and dropouts, experimenters rely on their experience, proposed methodologies from earlier studies, trial and error, or the scarcely available participant behavior data from previous ESM protocols. This approach often fails in finding the acceptable study parameters, resulting in redesigning the protocol and repeating the experiment. Research has shown the potential of machine learning to personalize ESM protocols such that ESM prompts are delivered at opportune moments, leading to higher response rates. The corresponding training process is hindered due to the scarcity of open data in the ESM domain, causing a cold start, which could be mitigated by simulating participant behavior. Such simulations provide training data and insights for the experimenters to update their study design choices. Creating this simulation requires behavioral science, psychology, and programming expertise. Large language models (LLMs) have emerged as facilitators for information inquiry and programming, albeit random and occasionally unreliable. We aspire to assess the readiness of LLMs in an ESM use case. We conducted research using GPT-3.5 turbo-16k to tackle an ESM simulation problem. We explored several prompt design alternatives to generate ESM simulation programs, evaluated the output code in terms of semantics and syntax, and interviewed ESM practitioners. We found that engineering LLM-enabled ESM simulations have the potential to facilitate data generation, but they perpetuate trust and reliability challenges.","PeriodicalId":36902,"journal":{"name":"Proceedings of the ACM on Human-Computer Interaction","volume":"12 40","pages":"1 - 19"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3661143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

The Experience Sampling Method (ESM) is commonly used to understand behaviors, thoughts, and feelings in the wild by collecting self-reports. Sustaining sufficient response rates, especially in long-running studies remains challenging. To avoid low response rates and dropouts, experimenters rely on their experience, proposed methodologies from earlier studies, trial and error, or the scarcely available participant behavior data from previous ESM protocols. This approach often fails in finding the acceptable study parameters, resulting in redesigning the protocol and repeating the experiment. Research has shown the potential of machine learning to personalize ESM protocols such that ESM prompts are delivered at opportune moments, leading to higher response rates. The corresponding training process is hindered due to the scarcity of open data in the ESM domain, causing a cold start, which could be mitigated by simulating participant behavior. Such simulations provide training data and insights for the experimenters to update their study design choices. Creating this simulation requires behavioral science, psychology, and programming expertise. Large language models (LLMs) have emerged as facilitators for information inquiry and programming, albeit random and occasionally unreliable. We aspire to assess the readiness of LLMs in an ESM use case. We conducted research using GPT-3.5 turbo-16k to tackle an ESM simulation problem. We explored several prompt design alternatives to generate ESM simulation programs, evaluated the output code in terms of semantics and syntax, and interviewed ESM practitioners. We found that engineering LLM-enabled ESM simulations have the potential to facilitate data generation, but they perpetuate trust and reliability challenges.

查看原文本刊更多论文

通过对大型语言模型进行迭代式上下文学习，评估模拟体验取样法中参与者行为的代码生成情况

经验取样法（ESM）通常用于通过收集自我报告来了解野生动物的行为、思想和感受。保持足够的回应率，尤其是在长期研究中保持足够的回应率，仍然具有挑战性。为了避免低响应率和辍学现象，实验人员会依靠自己的经验、先前研究中提出的方法、反复试验或从以前的 ESM 方案中获取极少的参与者行为数据。这种方法往往无法找到可接受的研究参数，因此需要重新设计方案并重复实验。研究表明，机器学习在个性化制定无害环境管理方案方面具有潜力，可在适当的时机发出无害环境管理提示，从而提高响应率。由于 ESM 领域开放数据的稀缺性，相应的训练过程受到阻碍，导致冷启动，而模拟参与者行为可以缓解这一问题。这种模拟可为实验人员提供培训数据和见解，以更新他们的研究设计选择。创建这种模拟需要行为科学、心理学和编程方面的专业知识。大型语言模型（LLM）已成为信息查询和编程的辅助工具，尽管它是随机的，有时也不可靠。我们希望评估 LLM 在 ESM 使用案例中的准备情况。我们使用 GPT-3.5 turbo-16k 进行了研究，以解决一个 ESM 模拟问题。我们探索了生成 ESM 仿真程序的几种提示设计方案，从语义和语法方面对输出代码进行了评估，并采访了 ESM 从业人员。我们发现，由工程 LLM 支持的 ESM 仿真具有促进数据生成的潜力，但会长期面临信任和可靠性方面的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM on Human-Computer Interaction Social Sciences-Social Sciences (miscellaneous)

CiteScore

5.90

自引率

0.00%

发文量

257