Generative language models exhibit social identity biases

IF 18.3 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Nature computational science Pub Date : 2024-12-12 DOI:10.1038/s43588-024-00741-1

Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek

{"title":"Generative language models exhibit social identity biases","authors":"Tiancheng Hu, Yara Kyrychenko, Steve Rathje, Nigel Collier, Sander van der Linden, Jon Roozenbeek","doi":"10.1038/s43588-024-00741-1","DOIUrl":null,"url":null,"abstract":"Social identity biases, particularly the tendency to favor one’s own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, ‘We are…’), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human–LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human–LLM interactions might reinforce existing social biases. Researchers show that large language models exhibit social identity biases similar to humans, having favoritism toward ingroups and hostility toward outgroups. These biases persist across models, training data and real-world human–LLM conversations.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"5 1","pages":"65-75"},"PeriodicalIF":18.3000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774750/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature computational science","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43588-024-00741-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Social identity biases, particularly the tendency to favor one’s own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, ‘We are…’), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human–LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human–LLM interactions might reinforce existing social biases. Researchers show that large language models exhibit social identity biases similar to humans, having favoritism toward ingroups and hostility toward outgroups. These biases persist across models, training data and real-world human–LLM conversations.

Abstract Image

查看原文本刊更多论文

生成语言模型表现出社会身份偏见。

社会身份偏见，特别是倾向于支持自己的群体（群体内团结）和贬低其他群体（群体外敌意），深深植根于人类的心理和社会行为。然而，这种偏见是否也存在于人工智能系统中尚不清楚。在这里，我们展示了大型语言模型（llm）表现出与人类相似的社会身份偏见模式。通过对77个不同的法学硕士进行句子补全提示（例如，“我们是……”），我们证明了几乎所有的基本模型和一些指令调整和偏好调整的模型都显示出明显的群体内偏爱和群体外背离。这些偏见在受控的实验环境和自然的人与法学硕士的对话中都表现出来。然而，我们发现仔细管理训练数据和专门的微调可以大大降低偏差水平。这些发现对开发更公平的人工智能系统具有重要意义，并强调迫切需要了解人类与法学硕士的互动如何强化现有的社会偏见。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature computational science

CiteScore

11.70

自引率

0.00%

发文量