{"title":"Does Linguistic Relativity Hypothesis Apply on ChatGPT Responses? Yes, It Does","authors":"Partha Pratim Ray","doi":"10.1111/coin.70103","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>We present the first comprehensive, end-to-end quantitative evaluation of the linguistic relativity hypothesis in AI-generated text, using ChatGPT-4o mini to generate responses to 10 culturally salient prompts across 13 typologically diverse languages. Semantic shifts were quantified using pairwise cosine similarity scores computed from multilingual MiniLM sentence embeddings. A one-way analysis of variance (ANOVA) reveals statistically significant variation in semantic alignment across language pairs, with <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>77</mn>\n <mo>,</mo>\n <mn>702</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>153</mn>\n </mrow>\n <annotation>$$ F\\left(77,702\\right)=2.153 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>29</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>7</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=2.29\\times 1{0}^{-7} $$</annotation>\n </semantics></math>, and effect size <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>191</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.191 $$</annotation>\n </semantics></math>. These results are further supported by a non-parametric Kruskal–Wallis test yielding <span></span><math>\n <semantics>\n <mrow>\n <mi>H</mi>\n <mo>=</mo>\n <mn>176</mn>\n <mo>.</mo>\n <mn>208</mn>\n </mrow>\n <annotation>$$ H=176.208 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>9</mn>\n <mo>.</mo>\n <mn>59</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>10</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=9.59\\times 1{0}^{-10} $$</annotation>\n </semantics></math>, indicating robust differences in distribution. Prompt-specific semantic shifts also exhibit significant variation, as shown by ANOVA results <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>770</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>24</mn>\n <mo>.</mo>\n <mn>239</mn>\n </mrow>\n <annotation>$$ F\\left(9,770\\right)=24.239 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>1</mn>\n <mo>.</mo>\n <mn>00</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>36</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=1.00\\times 1{0}^{-36} $$</annotation>\n </semantics></math>, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>221</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.221 $$</annotation>\n </semantics></math>. Sentiment polarity analysis using the Polyglot toolkit reveals significant effects of language on sentiment distribution, with <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>12</mn>\n <mo>,</mo>\n <mn>117</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>637</mn>\n </mrow>\n <annotation>$$ F\\left(12,117\\right)=2.637 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>0037</mn>\n </mrow>\n <annotation>$$ p=0.0037 $$</annotation>\n </semantics></math>, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>213</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.213 $$</annotation>\n </semantics></math>. Disaggregated analysis shows that positivity ratios differ by prompt (<span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>120</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>3</mn>\n <mo>.</mo>\n <mn>621</mn>\n </mrow>\n <annotation>$$ F\\left(9,120\\right)=3.621 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>0005</mn>\n </mrow>\n <annotation>$$ p=0.0005 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>214</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.214 $$</annotation>\n </semantics></math>), while negativity scores display even greater divergence across prompts with <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>120</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>12</mn>\n <mo>.</mo>\n <mn>755</mn>\n </mrow>\n <annotation>$$ F\\left(9,120\\right)=12.755 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>4</mn>\n <mo>.</mo>\n <mn>59</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>14</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=4.59\\times 1{0}^{-14} $$</annotation>\n </semantics></math>, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>489</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.489 $$</annotation>\n </semantics></math>. An unsupervised clustering procedure (<span></span><math>\n <semantics>\n <mrow>\n <mi>k</mi>\n <mo>=</mo>\n <mn>3</mn>\n </mrow>\n <annotation>$$ k=3 $$</annotation>\n </semantics></math>) classifies languages into three distinct groups based on semantic alignment: (i) high-alignment (<span></span><math>\n <semantics>\n <mrow>\n <mtext>mean similarity</mtext>\n <mo>≥</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>90</mn>\n </mrow>\n <annotation>$$ \\mathrm{mean}\\ \\mathrm{similarity}\\ge 0.90 $$</annotation>\n </semantics></math>), (ii) intermediate (<span></span><math>\n <semantics>\n <mrow>\n <mtext>mean similarity</mtext>\n <mo>≈</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>75</mn>\n <mo>−</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>85</mn>\n </mrow>\n <annotation>$$ \\mathrm{mean}\\ \\mathrm{similarity}\\approx 0.75-0.85 $$</annotation>\n </semantics></math>), and (iii) neutral-tone clusters. Each group exhibits distinctive polarity profiles, with median sentiment polarity ranging from <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>02</mn>\n </mrow>\n <annotation>$$ -0.02 $$</annotation>\n </semantics></math> to <span></span><math>\n <semantics>\n <mrow>\n <mn>0</mn>\n <mo>.</mo>\n <mn>11</mn>\n </mrow>\n <annotation>$$ 0.11 $$</annotation>\n </semantics></math>. These results demonstrate that linguistic structures exert a measurable influence on AI-generated content, underscoring the need for culturally sensitive AI design practices. These results affirm that ChatGPT-4o mini's outputs align with the linguistic relativity hypothesis, clearly illustrating that language structures significantly shape AI-driven interpretation All associated code and data are available in the GitHub repository: \nhttps://github.com/ParthaPRay/Liguistic_Relativity_Chatgpt.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70103","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
We present the first comprehensive, end-to-end quantitative evaluation of the linguistic relativity hypothesis in AI-generated text, using ChatGPT-4o mini to generate responses to 10 culturally salient prompts across 13 typologically diverse languages. Semantic shifts were quantified using pairwise cosine similarity scores computed from multilingual MiniLM sentence embeddings. A one-way analysis of variance (ANOVA) reveals statistically significant variation in semantic alignment across language pairs, with , , and effect size . These results are further supported by a non-parametric Kruskal–Wallis test yielding , , indicating robust differences in distribution. Prompt-specific semantic shifts also exhibit significant variation, as shown by ANOVA results , , and . Sentiment polarity analysis using the Polyglot toolkit reveals significant effects of language on sentiment distribution, with , , and . Disaggregated analysis shows that positivity ratios differ by prompt (, , ), while negativity scores display even greater divergence across prompts with , , and . An unsupervised clustering procedure () classifies languages into three distinct groups based on semantic alignment: (i) high-alignment (), (ii) intermediate (), and (iii) neutral-tone clusters. Each group exhibits distinctive polarity profiles, with median sentiment polarity ranging from to . These results demonstrate that linguistic structures exert a measurable influence on AI-generated content, underscoring the need for culturally sensitive AI design practices. These results affirm that ChatGPT-4o mini's outputs align with the linguistic relativity hypothesis, clearly illustrating that language structures significantly shape AI-driven interpretation All associated code and data are available in the GitHub repository:
https://github.com/ParthaPRay/Liguistic_Relativity_Chatgpt.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.