Does Linguistic Relativity Hypothesis Apply on ChatGPT Responses? Yes, It Does

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Intelligence Pub Date : 2025-07-10 DOI:10.1111/coin.70103

Partha Pratim Ray

{"title":"Does Linguistic Relativity Hypothesis Apply on ChatGPT Responses? Yes, It Does","authors":"Partha Pratim Ray","doi":"10.1111/coin.70103","DOIUrl":null,"url":null,"abstract":"<div>\n \n We present the first comprehensive, end-to-end quantitative evaluation of the linguistic relativity hypothesis in AI-generated text, using ChatGPT-4o mini to generate responses to 10 culturally salient prompts across 13 typologically diverse languages. Semantic shifts were quantified using pairwise cosine similarity scores computed from multilingual MiniLM sentence embeddings. A one-way analysis of variance (ANOVA) reveals statistically significant variation in semantic alignment across language pairs, with <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>77</mn>\n <mo>,</mo>\n <mn>702</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>153</mn>\n </mrow>\n <annotation>$$ F\\left(77,702\\right)=2.153 $$</annotation>\n </semantics></math>, <math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>29</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>7</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=2.29\\times 1{0}^{-7} $$</annotation>\n </semantics></math>, and effect size <math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>191</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.191 $$</annotation>\n </semantics></math>. These results are further supported by a non-parametric Kruskal–Wallis test yielding <math>\n <semantics>\n <mrow>\n <mi>H</mi>\n <mo>=</mo>\n <mn>176</mn>\n <mo>.</mo>\n <mn>208</mn>\n </mrow>\n <annotation>$$ H=176.208 $$</annotation>\n </semantics></math>, <math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>9</mn>\n <mo>.</mo>\n <mn>59</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>10</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=9.59\\times 1{0}^{-10} $$</annotation>\n </semantics></math>, indicating robust differences in distribution. Prompt-specific semantic shifts also exhibit significant variation, as shown by ANOVA results <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>770</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>24</mn>\n <mo>.</mo>\n <mn>239</mn>\n </mrow>\n <annotation>$$ F\\left(9,770\\right)=24.239 $$</annotation>\n </semantics></math>, <math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>1</mn>\n <mo>.</mo>\n <mn>00</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>36</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=1.00\\times 1{0}^{-36} $$</annotation>\n </semantics></math>, and <math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>221</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.221 $$</annotation>\n </semantics></math>. Sentiment polarity analysis using the Polyglot toolkit reveals significant effects of language on sentiment distribution, with <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>12</mn>\n <mo>,</mo>\n <mn>117</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>637</mn>\n </mrow>\n <annotation>$$ F\\left(12,117\\right)=2.637 $$</annotation>\n </semantics></math>, <math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>0037</mn>\n </mrow>\n <annotation>$$ p=0.0037 $$</annotation>\n </semantics></math>, and <math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>213</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.213 $$</annotation>\n </semantics></math>. Disaggregated analysis shows that positivity ratios differ by prompt (<math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>120</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>3</mn>\n <mo>.</mo>\n <mn>621</mn>\n </mrow>\n <annotation>$$ F\\left(9,120\\right)=3.621 $$</annotation>\n </semantics></math>, <math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>0005</mn>\n </mrow>\n <annotation>$$ p=0.0005 $$</annotation>\n </semantics></math>, <math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>214</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.214 $$</annotation>\n </semantics></math>), while negativity scores display even greater divergence across prompts with <math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>120</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>12</mn>\n <mo>.</mo>\n <mn>755</mn>\n </mrow>\n <annotation>$$ F\\left(9,120\\right)=12.755 $$</annotation>\n </semantics></math>, <math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>4</mn>\n <mo>.</mo>\n <mn>59</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>14</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=4.59\\times 1{0}^{-14} $$</annotation>\n </semantics></math>, and <math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>489</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.489 $$</annotation>\n </semantics></math>. An unsupervised clustering procedure (<math>\n <semantics>\n <mrow>\n <mi>k</mi>\n <mo>=</mo>\n <mn>3</mn>\n </mrow>\n <annotation>$$ k=3 $$</annotation>\n </semantics></math>) classifies languages into three distinct groups based on semantic alignment: (i) high-alignment (<math>\n <semantics>\n <mrow>\n <mtext>mean similarity</mtext>\n <mo>≥</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>90</mn>\n </mrow>\n <annotation>$$ \\mathrm{mean}\\ \\mathrm{similarity}\\ge 0.90 $$</annotation>\n </semantics></math>), (ii) intermediate (<math>\n <semantics>\n <mrow>\n <mtext>mean similarity</mtext>\n <mo>≈</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>75</mn>\n <mo>−</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>85</mn>\n </mrow>\n <annotation>$$ \\mathrm{mean}\\ \\mathrm{similarity}\\approx 0.75-0.85 $$</annotation>\n </semantics></math>), and (iii) neutral-tone clusters. Each group exhibits distinctive polarity profiles, with median sentiment polarity ranging from <math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>02</mn>\n </mrow>\n <annotation>$$ -0.02 $$</annotation>\n </semantics></math> to <math>\n <semantics>\n <mrow>\n <mn>0</mn>\n <mo>.</mo>\n <mn>11</mn>\n </mrow>\n <annotation>$$ 0.11 $$</annotation>\n </semantics></math>. These results demonstrate that linguistic structures exert a measurable influence on AI-generated content, underscoring the need for culturally sensitive AI design practices. These results affirm that ChatGPT-4o mini's outputs align with the linguistic relativity hypothesis, clearly illustrating that language structures significantly shape AI-driven interpretation All associated code and data are available in the GitHub repository: \nhttps://github.com/ParthaPRay/Liguistic_Relativity_Chatgpt.\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70103","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We present the first comprehensive, end-to-end quantitative evaluation of the linguistic relativity hypothesis in AI-generated text, using ChatGPT-4o mini to generate responses to 10 culturally salient prompts across 13 typologically diverse languages. Semantic shifts were quantified using pairwise cosine similarity scores computed from multilingual MiniLM sentence embeddings. A one-way analysis of variance (ANOVA) reveals statistically significant variation in semantic alignment across language pairs, with $F (77, 702) = 2.153$ , $p = 2.29 \times 1 0^{- 7}$ , and effect size $η^{2} = 0.191$ . These results are further supported by a non-parametric Kruskal–Wallis test yielding $H = 176.208$ , $p = 9.59 \times 1 0^{- 10}$ , indicating robust differences in distribution. Prompt-specific semantic shifts also exhibit significant variation, as shown by ANOVA results $F (9, 770) = 24.239$ , $p = 1.00 \times 1 0^{- 36}$ , and $η^{2} = 0.221$ . Sentiment polarity analysis using the Polyglot toolkit reveals significant effects of language on sentiment distribution, with $F (12, 117) = 2.637$ , $p = 0.0037$ , and $η^{2} = 0.213$ . Disaggregated analysis shows that positivity ratios differ by prompt ( $F (9, 120) = 3.621$ , $p = 0.0005$ , $η^{2} = 0.214$ ), while negativity scores display even greater divergence across prompts with $F (9, 120) = 12.755$ , $p = 4.59 \times 1 0^{- 14}$ , and $η^{2} = 0.489$ . An unsupervised clustering procedure ( $k = 3$ ) classifies languages into three distinct groups based on semantic alignment: (i) high-alignment ( $mean similarity \geq 0.90$ ), (ii) intermediate ( $mean similarity \approx 0.75 - 0.85$ ), and (iii) neutral-tone clusters. Each group exhibits distinctive polarity profiles, with median sentiment polarity ranging from $- 0.02$ to $0.11$ . These results demonstrate that linguistic structures exert a measurable influence on AI-generated content, underscoring the need for culturally sensitive AI design practices. These results affirm that ChatGPT-4o mini's outputs align with the linguistic relativity hypothesis, clearly illustrating that language structures significantly shape AI-driven interpretation All associated code and data are available in the GitHub repository: https://github.com/ParthaPRay/Liguistic_Relativity_Chatgpt.

查看原文本刊更多论文

语言相对论假说是否适用于聊天答题？是的，有

0005 $$, η 2 = 0。214 $$ {\eta}^2=0.214 $$)，而消极得分在F(9,120) = 12的提示中显示出更大的差异。755 $$ $ \左（9,120\右）=12.755 $$,p = 4。$$ p=4.59\times 1{0}^{-14} $$，η 2 = 0。$$ {\eta}^2=0.489 $$。一个无监督聚类过程（k=3 $$ k=3 $$）基于语义对齐将语言分为三个不同的组：(i)高对齐（平均相似度≥0）。90 $$ \mathrm{mean}\ \mathrm{similarity}\ge 0.90 $$), （ii）中间(mean similarity≈0。75−0。85 $$ \mathrm{mean}\ \mathrm{similarity}\约0.75-0.85 $$)，以及（iii）中性音调集群。每个组都表现出不同的极性特征，情感极性的中位数范围为- 0。02 $$ -0.02 $$到0。11 $$ 0.11 $$。这些结果表明，语言结构对人工智能生成的内容产生了可衡量的影响，强调了对文化敏感的人工智能设计实践的必要性。这些结果证实了chatgpt - 40mini的输出符合语言相对论假设，清楚地说明了语言结构显著地塑造了人工智能驱动的解释。所有相关代码和数据都可以在GitHub存储库中获得：https://github.com/ParthaPRay/Liguistic_Relativity_Chatgpt。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Intelligence 工程技术-计算机：人工智能

CiteScore

6.90

自引率

3.60%

发文量

审稿时长

>12 weeks

期刊介绍： This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.