A Reasoning and Value Alignment Test to Assess Advanced GPT Reasoning

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Timothy R. McIntosh, Tong Liu, Teo Susnjak, Paul Watters, Malka N. Halgamuge
{"title":"A Reasoning and Value Alignment Test to Assess Advanced GPT Reasoning","authors":"Timothy R. McIntosh, Tong Liu, Teo Susnjak, Paul Watters, Malka N. Halgamuge","doi":"10.1145/3670691","DOIUrl":null,"url":null,"abstract":"<p>In response to diverse perspectives on <i>Artificial General Intelligence</i> (AGI), ranging from potential safety and ethical concerns to more extreme views about the threats it poses to humanity, this research presents a generic method to gauge the reasoning capabilities of <i>Artificial Intelligence</i> (AI) models as a foundational step in evaluating safety measures. Recognizing that AI reasoning measures cannot be wholly automated, due to factors such as cultural complexity, we conducted an extensive examination of five commercial <i>Generative Pre-trained Transformers</i> (GPTs), focusing on their comprehension and interpretation of culturally intricate contexts. Utilizing our novel “Reasoning and Value Alignment Test”, we assessed the GPT models’ ability to reason in complex situations and grasp local cultural subtleties. Our findings have indicated that, although the models have exhibited high levels of human-like reasoning, significant limitations remained, especially concerning the interpretation of cultural contexts. This paper also explored potential applications and use-cases of our Test, underlining its significance in AI training, ethics compliance, sensitivity auditing, and AI-driven cultural consultation. We concluded by emphasizing its broader implications in the AGI domain, highlighting the necessity for interdisciplinary approaches, wider accessibility to various GPT models, and a profound understanding of the interplay between GPT reasoning and cultural sensitivity.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3670691","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

In response to diverse perspectives on Artificial General Intelligence (AGI), ranging from potential safety and ethical concerns to more extreme views about the threats it poses to humanity, this research presents a generic method to gauge the reasoning capabilities of Artificial Intelligence (AI) models as a foundational step in evaluating safety measures. Recognizing that AI reasoning measures cannot be wholly automated, due to factors such as cultural complexity, we conducted an extensive examination of five commercial Generative Pre-trained Transformers (GPTs), focusing on their comprehension and interpretation of culturally intricate contexts. Utilizing our novel “Reasoning and Value Alignment Test”, we assessed the GPT models’ ability to reason in complex situations and grasp local cultural subtleties. Our findings have indicated that, although the models have exhibited high levels of human-like reasoning, significant limitations remained, especially concerning the interpretation of cultural contexts. This paper also explored potential applications and use-cases of our Test, underlining its significance in AI training, ethics compliance, sensitivity auditing, and AI-driven cultural consultation. We concluded by emphasizing its broader implications in the AGI domain, highlighting the necessity for interdisciplinary approaches, wider accessibility to various GPT models, and a profound understanding of the interplay between GPT reasoning and cultural sensitivity.

评估高级 GPT 推理能力的推理和价值排列测试
针对有关人工智能(AGI)的各种观点,从潜在的安全和道德问题到对人类构成威胁的更极端观点,本研究提出了一种通用方法来衡量人工智能(AI)模型的推理能力,作为评估安全措施的基础步骤。我们认识到,由于文化复杂性等因素,人工智能推理措施不可能完全自动化,因此我们对五种商用生成预训练转换器(GPT)进行了广泛研究,重点关注它们对错综复杂的文化背景的理解和解释。我们利用新颖的 "推理和价值一致性测试",评估了 GPT 模型在复杂情况下的推理能力以及对当地文化微妙之处的把握能力。我们的研究结果表明,尽管模型表现出了高水平的类人推理能力,但仍存在很大的局限性,尤其是在文化背景的解释方面。本文还探讨了 "测试 "的潜在应用和使用案例,强调了其在人工智能培训、道德合规、敏感性审计和人工智能驱动的文化咨询方面的重要意义。最后,我们强调了它在人工智能领域的广泛影响,强调了跨学科方法的必要性、更广泛地使用各种 GPT 模型的可能性,以及对 GPT 推理和文化敏感性之间相互作用的深刻理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信