{"title":"与互动叙事中的社会规范和价值观保持一致","authors":"Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hannaneh Hajishirzi, Yejin Choi","doi":"10.48550/arXiv.2205.01975","DOIUrl":null,"url":null,"abstract":"We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games—environments wherein an agent perceives and interacts with a world through natural language. Such interactive agents are often trained via reinforcement learning to optimize task performance, even when such rewards may lead to agent behaviors that violate societal norms—causing harm either to the agent itself or other entities in the environment. Social value alignment refers to creating agents whose behaviors conform to expected moral and social norms for a given context and group of people—in our case, it means agents that behave in a manner that is less harmful and more beneficial for themselves and others.We build on the Jiminy Cricket benchmark (Hendrycks et al. 2021), a set of 25 annotated interactive narratives containing thousands of morally salient scenarios covering everything from theft and bodily harm to altruism. We introduce the GALAD (Game-value ALignment through Action Distillation) agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values. An experimental study shows that the GALAD agent makes decisions efficiently enough to improve state-of-the-art task performance by 4% while reducing the frequency of socially harmful behaviors by 25% compared to strong contemporary value alignment approaches.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Aligning to Social Norms and Values in Interactive Narratives\",\"authors\":\"Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hannaneh Hajishirzi, Yejin Choi\",\"doi\":\"10.48550/arXiv.2205.01975\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games—environments wherein an agent perceives and interacts with a world through natural language. Such interactive agents are often trained via reinforcement learning to optimize task performance, even when such rewards may lead to agent behaviors that violate societal norms—causing harm either to the agent itself or other entities in the environment. Social value alignment refers to creating agents whose behaviors conform to expected moral and social norms for a given context and group of people—in our case, it means agents that behave in a manner that is less harmful and more beneficial for themselves and others.We build on the Jiminy Cricket benchmark (Hendrycks et al. 2021), a set of 25 annotated interactive narratives containing thousands of morally salient scenarios covering everything from theft and bodily harm to altruism. We introduce the GALAD (Game-value ALignment through Action Distillation) agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values. An experimental study shows that the GALAD agent makes decisions efficiently enough to improve state-of-the-art task performance by 4% while reducing the frequency of socially harmful behaviors by 25% compared to strong contemporary value alignment approaches.\",\"PeriodicalId\":382084,\"journal\":{\"name\":\"North American Chapter of the Association for Computational Linguistics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"North American Chapter of the Association for Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.01975\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.01975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22
摘要
我们专注于在互动叙事或基于文本的游戏环境中创造符合社会有益规范和价值观的代理,其中代理通过自然语言感知世界并与之互动。这种交互式代理通常通过强化学习来训练,以优化任务性能,即使这样的奖励可能导致代理违反社会规范的行为——对代理本身或环境中的其他实体造成伤害。社会价值一致性指的是创造行为符合特定环境和人群的预期道德和社会规范的代理——在我们的例子中,这意味着代理的行为方式对自己和他人的伤害更小,更有益。我们以Jiminy Cricket基准为基础(Hendrycks et al. 2021),这是一套25个注释的交互式叙述,包含数千个道德上显著的场景,涵盖了从盗窃、身体伤害到利他主义的一切。我们引入GALAD (Game-value ALignment through Action Distillation)智能体,它使用经过特殊训练的语言模型中的社会常识知识,在上下文中将其行动空间限制为那些与社会有益价值一致的行动。一项实验研究表明,与强大的当代价值取向方法相比,GALAD代理做出的决策足够有效,可以将最先进的任务绩效提高4%,同时将社会有害行为的频率降低25%。
Aligning to Social Norms and Values in Interactive Narratives
We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games—environments wherein an agent perceives and interacts with a world through natural language. Such interactive agents are often trained via reinforcement learning to optimize task performance, even when such rewards may lead to agent behaviors that violate societal norms—causing harm either to the agent itself or other entities in the environment. Social value alignment refers to creating agents whose behaviors conform to expected moral and social norms for a given context and group of people—in our case, it means agents that behave in a manner that is less harmful and more beneficial for themselves and others.We build on the Jiminy Cricket benchmark (Hendrycks et al. 2021), a set of 25 annotated interactive narratives containing thousands of morally salient scenarios covering everything from theft and bodily harm to altruism. We introduce the GALAD (Game-value ALignment through Action Distillation) agent that uses the social commonsense knowledge present in specially trained language models to contextually restrict its action space to only those actions that are aligned with socially beneficial values. An experimental study shows that the GALAD agent makes decisions efficiently enough to improve state-of-the-art task performance by 4% while reducing the frequency of socially harmful behaviors by 25% compared to strong contemporary value alignment approaches.