Kevin Allan, Jacobo Azcona, Somayajulu Sripada, Georgios Leontidis, Clare A M Sutherland, Louise H Phillips, Douglas Martin
{"title":"Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence.","authors":"Kevin Allan, Jacobo Azcona, Somayajulu Sripada, Georgios Leontidis, Clare A M Sutherland, Louise H Phillips, Douglas Martin","doi":"10.1098/rsos.241472","DOIUrl":null,"url":null,"abstract":"<p><p>Stereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human-AI interaction relative to pre-existing baseline levels has not been demonstrated. Here, we use previous psychological work on gendered character traits to capture and control gender stereotypes expressed in character descriptions generated by Open AI's GPT3.5. In four experiments (<i>N</i> = 782) with a first impressions task, we find that unexplained ('black-box') character recommendations using stereotypical traits already convey a potent persuasive influence significantly amplifying baseline stereotyping within first impressions. Recommendations that are counter-stereotypical eliminate and effectively reverse human baseline bias, but these stereotype-challenging influences propagate less well than reinforcing influences from stereotypical recommendations. Critically, the bias amplification and reversal phenomena occur when GPT3.5 elaborates on the core stereotypical content, although GPT3.5's explanations propagate counter-stereotypical influence more effectively and persuasively than black-box recommendations. Our findings strongly imply that without robust safeguards, generative AI will amplify existing bias. But with safeguards, existing bias can be eliminated and even reversed. Our novel approach safely allows such effects to be studied in various contexts where gender and other bias-inducing social stereotypes operate.</p>","PeriodicalId":21525,"journal":{"name":"Royal Society Open Science","volume":"12 4","pages":"241472"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11979296/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Royal Society Open Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsos.241472","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Stereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human-AI interaction relative to pre-existing baseline levels has not been demonstrated. Here, we use previous psychological work on gendered character traits to capture and control gender stereotypes expressed in character descriptions generated by Open AI's GPT3.5. In four experiments (N = 782) with a first impressions task, we find that unexplained ('black-box') character recommendations using stereotypical traits already convey a potent persuasive influence significantly amplifying baseline stereotyping within first impressions. Recommendations that are counter-stereotypical eliminate and effectively reverse human baseline bias, but these stereotype-challenging influences propagate less well than reinforcing influences from stereotypical recommendations. Critically, the bias amplification and reversal phenomena occur when GPT3.5 elaborates on the core stereotypical content, although GPT3.5's explanations propagate counter-stereotypical influence more effectively and persuasively than black-box recommendations. Our findings strongly imply that without robust safeguards, generative AI will amplify existing bias. But with safeguards, existing bias can be eliminated and even reversed. Our novel approach safely allows such effects to be studied in various contexts where gender and other bias-inducing social stereotypes operate.
期刊介绍:
Royal Society Open Science is a new open journal publishing high-quality original research across the entire range of science on the basis of objective peer-review.
The journal covers the entire range of science and mathematics and will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or impact.