{"title":"Predicting the Next Response: Demonstrating the Utility of Integrating Artificial Intelligence-Based Reinforcement Learning with Behavior Science.","authors":"David J Cox, Carlos Santos","doi":"10.1007/s40614-025-00444-6","DOIUrl":null,"url":null,"abstract":"<p><p>The concepts of reinforcement and punishment arose in two disparate scientific domains of psychology and artificial intelligence (AI). Behavior scientists study how biological organisms <i>do</i> behave as a function of their environment, whereas AI focuses on how artificial agents <i>should</i> behave to maximize reward or minimize punishment. This article describes the broad characteristics of AI-based reinforcement learning (RL), how those differ from operant research, and how combining insights from each might advance research in both domains. To demonstrate this mutual utility, 12 artificial organisms (AOs) were built for six participants to predict the next response they emitted. Each AO used one of six combinations of feature sets informed by operant research, with or without punishing incorrect predictions. A 13<sup>th</sup> predictive approach, termed \"human choice modeled by Q-learning,\" uses the mechanism of Q-learning to update context-response-outcome values following each response and to choose the next response. This approach achieved the highest average predictive accuracy of 95% (range 90%-99%). The next highest accuracy, averaging 89% (range: 85%-93%), required molecular and molar information and punishment contingencies. Predictions based only on molar or molecular information and with punishment contingencies averaged 71%-72% accuracy. Without punishment, prediction accuracy dropped to 47%-54%, regardless of the feature set. This work highlights how AI-based RL techniques, combined with operant and respondent domain knowledge, can enhance behavior scientists' ability to predict the behavior of organisms. These techniques also allow researchers to address theoretical questions about important topics such as multiscale models of behavior and the role of punishment in learning.</p>","PeriodicalId":44993,"journal":{"name":"Perspectives on Behavior Science","volume":"48 2","pages":"241-267"},"PeriodicalIF":2.5000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12162397/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Perspectives on Behavior Science","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1007/s40614-025-00444-6","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}
引用次数: 0
Abstract
The concepts of reinforcement and punishment arose in two disparate scientific domains of psychology and artificial intelligence (AI). Behavior scientists study how biological organisms do behave as a function of their environment, whereas AI focuses on how artificial agents should behave to maximize reward or minimize punishment. This article describes the broad characteristics of AI-based reinforcement learning (RL), how those differ from operant research, and how combining insights from each might advance research in both domains. To demonstrate this mutual utility, 12 artificial organisms (AOs) were built for six participants to predict the next response they emitted. Each AO used one of six combinations of feature sets informed by operant research, with or without punishing incorrect predictions. A 13th predictive approach, termed "human choice modeled by Q-learning," uses the mechanism of Q-learning to update context-response-outcome values following each response and to choose the next response. This approach achieved the highest average predictive accuracy of 95% (range 90%-99%). The next highest accuracy, averaging 89% (range: 85%-93%), required molecular and molar information and punishment contingencies. Predictions based only on molar or molecular information and with punishment contingencies averaged 71%-72% accuracy. Without punishment, prediction accuracy dropped to 47%-54%, regardless of the feature set. This work highlights how AI-based RL techniques, combined with operant and respondent domain knowledge, can enhance behavior scientists' ability to predict the behavior of organisms. These techniques also allow researchers to address theoretical questions about important topics such as multiscale models of behavior and the role of punishment in learning.
期刊介绍:
Perspectives on Behavior Science is an official publication of the Association for Behavior Analysis International. It is published quarterly, and in addition to its articles on theoretical, experimental, and applied topics in behavior analysis, this journal also includes literature reviews, re-interpretations of published data, and articles on behaviorism as a philosophy.