Lixiao Huang, Jared Freeman, Nancy J Cooke, Myke C Cohen, Xiaoyun Yin, Jeska Clark, Matt Wood, Verica Buchanan, Christopher Corral, Federico Scholcover, Anagha Mudigonda, Lovein Thomas, Aaron Teo, John Colonna-Romano
{"title":"Establishing Human Observer Criterion in Evaluating Artificial Social Intelligence Agents in a Search and Rescue Task.","authors":"Lixiao Huang, Jared Freeman, Nancy J Cooke, Myke C Cohen, Xiaoyun Yin, Jeska Clark, Matt Wood, Verica Buchanan, Christopher Corral, Federico Scholcover, Anagha Mudigonda, Lovein Thomas, Aaron Teo, John Colonna-Romano","doi":"10.1111/tops.12648","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial social intelligence (ASI) agents have great potential to aid the success of individuals, human-human teams, and human-artificial intelligence teams. To develop helpful ASI agents, we created an urban search and rescue task environment in Minecraft to evaluate ASI agents' ability to infer participants' knowledge training conditions and predict participants' next victim type to be rescued. We evaluated ASI agents' capabilities in three ways: (a) comparison to ground truth-the actual knowledge training condition and participant actions; (b) comparison among different ASI agents; and (c) comparison to a human observer criterion, whose accuracy served as a reference point. The human observers and the ASI agents used video data and timestamped event messages from the testbed, respectively, to make inferences about the same participants and topic (knowledge training condition) and the same instances of participant actions (rescue of victims). Overall, ASI agents performed better than human observers in inferring knowledge training conditions and predicting actions. Refining the human criterion can guide the design and evaluation of ASI agents for complex task environments and team composition.</p>","PeriodicalId":47822,"journal":{"name":"Topics in Cognitive Science","volume":" ","pages":"349-373"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Topics in Cognitive Science","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1111/tops.12648","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/4/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Artificial social intelligence (ASI) agents have great potential to aid the success of individuals, human-human teams, and human-artificial intelligence teams. To develop helpful ASI agents, we created an urban search and rescue task environment in Minecraft to evaluate ASI agents' ability to infer participants' knowledge training conditions and predict participants' next victim type to be rescued. We evaluated ASI agents' capabilities in three ways: (a) comparison to ground truth-the actual knowledge training condition and participant actions; (b) comparison among different ASI agents; and (c) comparison to a human observer criterion, whose accuracy served as a reference point. The human observers and the ASI agents used video data and timestamped event messages from the testbed, respectively, to make inferences about the same participants and topic (knowledge training condition) and the same instances of participant actions (rescue of victims). Overall, ASI agents performed better than human observers in inferring knowledge training conditions and predicting actions. Refining the human criterion can guide the design and evaluation of ASI agents for complex task environments and team composition.
期刊介绍:
Topics in Cognitive Science (topiCS) is an innovative new journal that covers all areas of cognitive science including cognitive modeling, cognitive neuroscience, cognitive anthropology, and cognitive science and philosophy. topiCS aims to provide a forum for: -New communities of researchers- New controversies in established areas- Debates and commentaries- Reflections and integration The publication features multiple scholarly papers dedicated to a single topic. Some of these topics will appear together in one issue, but others may appear across several issues or develop into a regular feature. Controversies or debates started in one issue may be followed up by commentaries in a later issue, etc. However, the format and origin of the topics will vary greatly.