Predicting the Next Response: Demonstrating the Utility of Integrating Artificial Intelligence-Based Reinforcement Learning with Behavior Science.

IF 3.8 3区心理学 Q2 PSYCHOLOGY, CLINICAL

Perspectives on Behavior Science Pub Date : 2025-04-30 eCollection Date: 2025-06-01 DOI:10.1007/s40614-025-00444-6

David J Cox, Carlos Santos

{"title":"Predicting the Next Response: Demonstrating the Utility of Integrating Artificial Intelligence-Based Reinforcement Learning with Behavior Science.","authors":"David J Cox, Carlos Santos","doi":"10.1007/s40614-025-00444-6","DOIUrl":null,"url":null,"abstract":"The concepts of reinforcement and punishment arose in two disparate scientific domains of psychology and artificial intelligence (AI). Behavior scientists study how biological organisms do behave as a function of their environment, whereas AI focuses on how artificial agents should behave to maximize reward or minimize punishment. This article describes the broad characteristics of AI-based reinforcement learning (RL), how those differ from operant research, and how combining insights from each might advance research in both domains. To demonstrate this mutual utility, 12 artificial organisms (AOs) were built for six participants to predict the next response they emitted. Each AO used one of six combinations of feature sets informed by operant research, with or without punishing incorrect predictions. A 13th predictive approach, termed \"human choice modeled by Q-learning,\" uses the mechanism of Q-learning to update context-response-outcome values following each response and to choose the next response. This approach achieved the highest average predictive accuracy of 95% (range 90%-99%). The next highest accuracy, averaging 89% (range: 85%-93%), required molecular and molar information and punishment contingencies. Predictions based only on molar or molecular information and with punishment contingencies averaged 71%-72% accuracy. Without punishment, prediction accuracy dropped to 47%-54%, regardless of the feature set. This work highlights how AI-based RL techniques, combined with operant and respondent domain knowledge, can enhance behavior scientists' ability to predict the behavior of organisms. These techniques also allow researchers to address theoretical questions about important topics such as multiscale models of behavior and the role of punishment in learning.","PeriodicalId":44993,"journal":{"name":"Perspectives on Behavior Science","volume":"48 2","pages":"241-267"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162397/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Perspectives on Behavior Science","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1007/s40614-025-00444-6","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}

引用次数: 0

Abstract

The concepts of reinforcement and punishment arose in two disparate scientific domains of psychology and artificial intelligence (AI). Behavior scientists study how biological organisms do behave as a function of their environment, whereas AI focuses on how artificial agents should behave to maximize reward or minimize punishment. This article describes the broad characteristics of AI-based reinforcement learning (RL), how those differ from operant research, and how combining insights from each might advance research in both domains. To demonstrate this mutual utility, 12 artificial organisms (AOs) were built for six participants to predict the next response they emitted. Each AO used one of six combinations of feature sets informed by operant research, with or without punishing incorrect predictions. A 13^th predictive approach, termed "human choice modeled by Q-learning," uses the mechanism of Q-learning to update context-response-outcome values following each response and to choose the next response. This approach achieved the highest average predictive accuracy of 95% (range 90%-99%). The next highest accuracy, averaging 89% (range: 85%-93%), required molecular and molar information and punishment contingencies. Predictions based only on molar or molecular information and with punishment contingencies averaged 71%-72% accuracy. Without punishment, prediction accuracy dropped to 47%-54%, regardless of the feature set. This work highlights how AI-based RL techniques, combined with operant and respondent domain knowledge, can enhance behavior scientists' ability to predict the behavior of organisms. These techniques also allow researchers to address theoretical questions about important topics such as multiscale models of behavior and the role of punishment in learning.

查看原文本刊更多论文

预测下一个反应：展示整合基于人工智能的强化学习与行为科学的效用。

强化和惩罚的概念出现在心理学和人工智能（AI）两个不同的科学领域。行为科学家研究生物有机体如何作为其环境的函数而行动，而人工智能关注的是人工代理应该如何行动以最大化奖励或最小化惩罚。本文描述了基于人工智能的强化学习（RL）的广泛特征，它们与操作性研究的不同之处，以及如何结合两者的见解来推进这两个领域的研究。为了证明这种相互效用，为6名参与者构建了12个人工生物体（ao），以预测他们发出的下一个反应。每个AO使用由操作性研究提供的六种特征集组合中的一种，有或没有惩罚不正确的预测。第13种预测方法被称为“由Q-learning建模的人类选择”，它使用Q-learning的机制在每个反应之后更新上下文-反应-结果值，并选择下一个反应。该方法的最高平均预测准确率为95%（范围为90%-99%）。第二高的准确率平均为89%（范围：85%-93%），需要分子和摩尔信息以及惩罚偶发。仅基于摩尔或分子信息以及附带惩罚的预测平均准确率为71%-72%。在没有惩罚的情况下，无论特征集如何，预测准确率都下降到47%-54%。这项工作强调了基于人工智能的强化学习技术如何与操作和应答领域知识相结合，可以提高行为科学家预测生物体行为的能力。这些技术还允许研究人员解决一些重要主题的理论问题，如行为的多尺度模型和惩罚在学习中的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Perspectives on Behavior Science PSYCHOLOGY, CLINICAL-

CiteScore

3.90

自引率

10.00%

发文量

期刊介绍： Perspectives on Behavior Science is an official publication of the Association for Behavior Analysis International. It is published quarterly, and in addition to its articles on theoretical, experimental, and applied topics in behavior analysis, this journal also includes literature reviews, re-interpretations of published data, and articles on behaviorism as a philosophy.