Ye Wang , Tong Li , Meixuan Li , Ziyue Cheng , Ge Wang , Hanyue Kang , Yaling Deng , Hongjiang Xiao , Yuan Zhang
{"title":"RVBench:角色扮演法学硕士的角色价值基准","authors":"Ye Wang , Tong Li , Meixuan Li , Ziyue Cheng , Ge Wang , Hanyue Kang , Yaling Deng , Hongjiang Xiao , Yuan Zhang","doi":"10.1016/j.chbah.2025.100184","DOIUrl":null,"url":null,"abstract":"<div><div>With the explosive development of Large Language Models (LLMs), the demand for role-playing agents has greatly increased to promote applications such as personalized digital companion and artificial society simulation. In LLM-driven role-playing, the values of agents lay the foundation for their attitudes and behaviors, thus alignment of values is crucial in enhancing the realism of interactions and enriching the user experience. However, a benchmark for evaluating values in role-playing LLMs is absent. In this study, we built a Role Values Dataset (RVD) containing 25 roles as the groundtruth. Additionally, inspired by psychological tests in humans, we proposed a Role Values Benchmark (RVBench) including values rating and values ranking methods to evaluate the values of role-playing LLMs from subjective questionnaires and observed behavior. The values rating method tests the values orientation through the revised Portrait Values Questionnaire (PVQ-RR), which provides a direct and quantitative comparison of the roles to be played. The values ranking method assesses whether the behaviors of agents are consistent with their values’ hierarchical organization when encountering dilemmatic scenarios. Subsequent testing on a selection of both open-source and closed-source LLMs revealed that GLM-4 exhibited values most closely mirroring the roles in the RVD. However, compared to preset roles, there is still a certain gap in the role-playing ability of LLMs, including the consistency, stability and flexibility in value dimensions. These findings prompt a vital need for further research aimed at refining the role-playing capacities of LLMs from a value alignment perspective. The RVD is available at: <span><span>https://github.com/northwang/RVD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":100324,"journal":{"name":"Computers in Human Behavior: Artificial Humans","volume":"5 ","pages":"Article 100184"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RVBench: Role values benchmark for role-playing LLMs\",\"authors\":\"Ye Wang , Tong Li , Meixuan Li , Ziyue Cheng , Ge Wang , Hanyue Kang , Yaling Deng , Hongjiang Xiao , Yuan Zhang\",\"doi\":\"10.1016/j.chbah.2025.100184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the explosive development of Large Language Models (LLMs), the demand for role-playing agents has greatly increased to promote applications such as personalized digital companion and artificial society simulation. In LLM-driven role-playing, the values of agents lay the foundation for their attitudes and behaviors, thus alignment of values is crucial in enhancing the realism of interactions and enriching the user experience. However, a benchmark for evaluating values in role-playing LLMs is absent. In this study, we built a Role Values Dataset (RVD) containing 25 roles as the groundtruth. Additionally, inspired by psychological tests in humans, we proposed a Role Values Benchmark (RVBench) including values rating and values ranking methods to evaluate the values of role-playing LLMs from subjective questionnaires and observed behavior. The values rating method tests the values orientation through the revised Portrait Values Questionnaire (PVQ-RR), which provides a direct and quantitative comparison of the roles to be played. The values ranking method assesses whether the behaviors of agents are consistent with their values’ hierarchical organization when encountering dilemmatic scenarios. Subsequent testing on a selection of both open-source and closed-source LLMs revealed that GLM-4 exhibited values most closely mirroring the roles in the RVD. However, compared to preset roles, there is still a certain gap in the role-playing ability of LLMs, including the consistency, stability and flexibility in value dimensions. These findings prompt a vital need for further research aimed at refining the role-playing capacities of LLMs from a value alignment perspective. The RVD is available at: <span><span>https://github.com/northwang/RVD</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":100324,\"journal\":{\"name\":\"Computers in Human Behavior: Artificial Humans\",\"volume\":\"5 \",\"pages\":\"Article 100184\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Human Behavior: Artificial Humans\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949882125000684\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Human Behavior: Artificial Humans","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949882125000684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
随着大语言模型(Large Language Models, llm)的爆炸式发展,角色扮演代理的需求大幅增加,以促进个性化数字伴侣和人工社会模拟等应用。在法学硕士驱动的角色扮演中,agent的价值观是其态度和行为的基础,因此价值观的一致性对于增强交互的真实感和丰富用户体验至关重要。然而,评估角色扮演法学硕士价值的基准是缺失的。在本研究中,我们建立了一个包含25个角色的角色价值数据集(RVD)作为基础事实。此外,受人类心理测试的启发,我们提出了一个角色价值基准(RVBench),包括价值观评级和价值观排名方法,从主观问卷调查和观察行为来评估角色扮演法学硕士的价值。价值观评定法通过修订后的肖像价值观问卷(PVQ-RR)来检验价值观取向,对所要扮演的角色进行直接和定量的比较。价值观排序法评估agent在遇到两难情境时的行为是否与其价值观的层级组织相一致。随后对选择的开源和闭源llm进行的测试表明,GLM-4所展示的值最接近于RVD中的角色。但法学硕士的角色扮演能力与预设角色相比,在价值维度上的一致性、稳定性、灵活性等方面仍有一定差距。这些发现提示了进一步研究的迫切需要,旨在从价值一致性的角度提炼法学硕士的角色扮演能力。RVD可在https://github.com/northwang/RVD上获得。
RVBench: Role values benchmark for role-playing LLMs
With the explosive development of Large Language Models (LLMs), the demand for role-playing agents has greatly increased to promote applications such as personalized digital companion and artificial society simulation. In LLM-driven role-playing, the values of agents lay the foundation for their attitudes and behaviors, thus alignment of values is crucial in enhancing the realism of interactions and enriching the user experience. However, a benchmark for evaluating values in role-playing LLMs is absent. In this study, we built a Role Values Dataset (RVD) containing 25 roles as the groundtruth. Additionally, inspired by psychological tests in humans, we proposed a Role Values Benchmark (RVBench) including values rating and values ranking methods to evaluate the values of role-playing LLMs from subjective questionnaires and observed behavior. The values rating method tests the values orientation through the revised Portrait Values Questionnaire (PVQ-RR), which provides a direct and quantitative comparison of the roles to be played. The values ranking method assesses whether the behaviors of agents are consistent with their values’ hierarchical organization when encountering dilemmatic scenarios. Subsequent testing on a selection of both open-source and closed-source LLMs revealed that GLM-4 exhibited values most closely mirroring the roles in the RVD. However, compared to preset roles, there is still a certain gap in the role-playing ability of LLMs, including the consistency, stability and flexibility in value dimensions. These findings prompt a vital need for further research aimed at refining the role-playing capacities of LLMs from a value alignment perspective. The RVD is available at: https://github.com/northwang/RVD.