求助PDF
{"title":"语言相对论假说是否适用于聊天答题?是的,有","authors":"Partha Pratim Ray","doi":"10.1111/coin.70103","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>We present the first comprehensive, end-to-end quantitative evaluation of the linguistic relativity hypothesis in AI-generated text, using ChatGPT-4o mini to generate responses to 10 culturally salient prompts across 13 typologically diverse languages. Semantic shifts were quantified using pairwise cosine similarity scores computed from multilingual MiniLM sentence embeddings. A one-way analysis of variance (ANOVA) reveals statistically significant variation in semantic alignment across language pairs, with <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>77</mn>\n <mo>,</mo>\n <mn>702</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>153</mn>\n </mrow>\n <annotation>$$ F\\left(77,702\\right)=2.153 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>29</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>7</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=2.29\\times 1{0}^{-7} $$</annotation>\n </semantics></math>, and effect size <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>191</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.191 $$</annotation>\n </semantics></math>. These results are further supported by a non-parametric Kruskal–Wallis test yielding <span></span><math>\n <semantics>\n <mrow>\n <mi>H</mi>\n <mo>=</mo>\n <mn>176</mn>\n <mo>.</mo>\n <mn>208</mn>\n </mrow>\n <annotation>$$ H=176.208 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>9</mn>\n <mo>.</mo>\n <mn>59</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>10</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=9.59\\times 1{0}^{-10} $$</annotation>\n </semantics></math>, indicating robust differences in distribution. Prompt-specific semantic shifts also exhibit significant variation, as shown by ANOVA results <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>770</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>24</mn>\n <mo>.</mo>\n <mn>239</mn>\n </mrow>\n <annotation>$$ F\\left(9,770\\right)=24.239 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>1</mn>\n <mo>.</mo>\n <mn>00</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>36</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=1.00\\times 1{0}^{-36} $$</annotation>\n </semantics></math>, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>221</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.221 $$</annotation>\n </semantics></math>. Sentiment polarity analysis using the Polyglot toolkit reveals significant effects of language on sentiment distribution, with <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>12</mn>\n <mo>,</mo>\n <mn>117</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>2</mn>\n <mo>.</mo>\n <mn>637</mn>\n </mrow>\n <annotation>$$ F\\left(12,117\\right)=2.637 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>0037</mn>\n </mrow>\n <annotation>$$ p=0.0037 $$</annotation>\n </semantics></math>, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>213</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.213 $$</annotation>\n </semantics></math>. Disaggregated analysis shows that positivity ratios differ by prompt (<span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>120</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>3</mn>\n <mo>.</mo>\n <mn>621</mn>\n </mrow>\n <annotation>$$ F\\left(9,120\\right)=3.621 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>0005</mn>\n </mrow>\n <annotation>$$ p=0.0005 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>214</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.214 $$</annotation>\n </semantics></math>), while negativity scores display even greater divergence across prompts with <span></span><math>\n <semantics>\n <mrow>\n <mi>F</mi>\n <mo>(</mo>\n <mn>9</mn>\n <mo>,</mo>\n <mn>120</mn>\n <mo>)</mo>\n <mo>=</mo>\n <mn>12</mn>\n <mo>.</mo>\n <mn>755</mn>\n </mrow>\n <annotation>$$ F\\left(9,120\\right)=12.755 $$</annotation>\n </semantics></math>, <span></span><math>\n <semantics>\n <mrow>\n <mi>p</mi>\n <mo>=</mo>\n <mn>4</mn>\n <mo>.</mo>\n <mn>59</mn>\n <mo>×</mo>\n <mn>1</mn>\n <msup>\n <mrow>\n <mn>0</mn>\n </mrow>\n <mrow>\n <mo>−</mo>\n <mn>14</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ p=4.59\\times 1{0}^{-14} $$</annotation>\n </semantics></math>, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>η</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n <mo>=</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>489</mn>\n </mrow>\n <annotation>$$ {\\eta}^2=0.489 $$</annotation>\n </semantics></math>. An unsupervised clustering procedure (<span></span><math>\n <semantics>\n <mrow>\n <mi>k</mi>\n <mo>=</mo>\n <mn>3</mn>\n </mrow>\n <annotation>$$ k=3 $$</annotation>\n </semantics></math>) classifies languages into three distinct groups based on semantic alignment: (i) high-alignment (<span></span><math>\n <semantics>\n <mrow>\n <mtext>mean similarity</mtext>\n <mo>≥</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>90</mn>\n </mrow>\n <annotation>$$ \\mathrm{mean}\\ \\mathrm{similarity}\\ge 0.90 $$</annotation>\n </semantics></math>), (ii) intermediate (<span></span><math>\n <semantics>\n <mrow>\n <mtext>mean similarity</mtext>\n <mo>≈</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>75</mn>\n <mo>−</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>85</mn>\n </mrow>\n <annotation>$$ \\mathrm{mean}\\ \\mathrm{similarity}\\approx 0.75-0.85 $$</annotation>\n </semantics></math>), and (iii) neutral-tone clusters. Each group exhibits distinctive polarity profiles, with median sentiment polarity ranging from <span></span><math>\n <semantics>\n <mrow>\n <mo>−</mo>\n <mn>0</mn>\n <mo>.</mo>\n <mn>02</mn>\n </mrow>\n <annotation>$$ -0.02 $$</annotation>\n </semantics></math> to <span></span><math>\n <semantics>\n <mrow>\n <mn>0</mn>\n <mo>.</mo>\n <mn>11</mn>\n </mrow>\n <annotation>$$ 0.11 $$</annotation>\n </semantics></math>. These results demonstrate that linguistic structures exert a measurable influence on AI-generated content, underscoring the need for culturally sensitive AI design practices. These results affirm that ChatGPT-4o mini's outputs align with the linguistic relativity hypothesis, clearly illustrating that language structures significantly shape AI-driven interpretation All associated code and data are available in the GitHub repository: \nhttps://github.com/ParthaPRay/Liguistic_Relativity_Chatgpt.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Does Linguistic Relativity Hypothesis Apply on ChatGPT Responses? Yes, It Does\",\"authors\":\"Partha Pratim Ray\",\"doi\":\"10.1111/coin.70103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>We present the first comprehensive, end-to-end quantitative evaluation of the linguistic relativity hypothesis in AI-generated text, using ChatGPT-4o mini to generate responses to 10 culturally salient prompts across 13 typologically diverse languages. Semantic shifts were quantified using pairwise cosine similarity scores computed from multilingual MiniLM sentence embeddings. A one-way analysis of variance (ANOVA) reveals statistically significant variation in semantic alignment across language pairs, with <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>F</mi>\\n <mo>(</mo>\\n <mn>77</mn>\\n <mo>,</mo>\\n <mn>702</mn>\\n <mo>)</mo>\\n <mo>=</mo>\\n <mn>2</mn>\\n <mo>.</mo>\\n <mn>153</mn>\\n </mrow>\\n <annotation>$$ F\\\\left(77,702\\\\right)=2.153 $$</annotation>\\n </semantics></math>, <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>p</mi>\\n <mo>=</mo>\\n <mn>2</mn>\\n <mo>.</mo>\\n <mn>29</mn>\\n <mo>×</mo>\\n <mn>1</mn>\\n <msup>\\n <mrow>\\n <mn>0</mn>\\n </mrow>\\n <mrow>\\n <mo>−</mo>\\n <mn>7</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ p=2.29\\\\times 1{0}^{-7} $$</annotation>\\n </semantics></math>, and effect size <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>η</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n <mo>=</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>191</mn>\\n </mrow>\\n <annotation>$$ {\\\\eta}^2=0.191 $$</annotation>\\n </semantics></math>. These results are further supported by a non-parametric Kruskal–Wallis test yielding <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>H</mi>\\n <mo>=</mo>\\n <mn>176</mn>\\n <mo>.</mo>\\n <mn>208</mn>\\n </mrow>\\n <annotation>$$ H=176.208 $$</annotation>\\n </semantics></math>, <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>p</mi>\\n <mo>=</mo>\\n <mn>9</mn>\\n <mo>.</mo>\\n <mn>59</mn>\\n <mo>×</mo>\\n <mn>1</mn>\\n <msup>\\n <mrow>\\n <mn>0</mn>\\n </mrow>\\n <mrow>\\n <mo>−</mo>\\n <mn>10</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ p=9.59\\\\times 1{0}^{-10} $$</annotation>\\n </semantics></math>, indicating robust differences in distribution. Prompt-specific semantic shifts also exhibit significant variation, as shown by ANOVA results <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>F</mi>\\n <mo>(</mo>\\n <mn>9</mn>\\n <mo>,</mo>\\n <mn>770</mn>\\n <mo>)</mo>\\n <mo>=</mo>\\n <mn>24</mn>\\n <mo>.</mo>\\n <mn>239</mn>\\n </mrow>\\n <annotation>$$ F\\\\left(9,770\\\\right)=24.239 $$</annotation>\\n </semantics></math>, <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>p</mi>\\n <mo>=</mo>\\n <mn>1</mn>\\n <mo>.</mo>\\n <mn>00</mn>\\n <mo>×</mo>\\n <mn>1</mn>\\n <msup>\\n <mrow>\\n <mn>0</mn>\\n </mrow>\\n <mrow>\\n <mo>−</mo>\\n <mn>36</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ p=1.00\\\\times 1{0}^{-36} $$</annotation>\\n </semantics></math>, and <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>η</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n <mo>=</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>221</mn>\\n </mrow>\\n <annotation>$$ {\\\\eta}^2=0.221 $$</annotation>\\n </semantics></math>. Sentiment polarity analysis using the Polyglot toolkit reveals significant effects of language on sentiment distribution, with <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>F</mi>\\n <mo>(</mo>\\n <mn>12</mn>\\n <mo>,</mo>\\n <mn>117</mn>\\n <mo>)</mo>\\n <mo>=</mo>\\n <mn>2</mn>\\n <mo>.</mo>\\n <mn>637</mn>\\n </mrow>\\n <annotation>$$ F\\\\left(12,117\\\\right)=2.637 $$</annotation>\\n </semantics></math>, <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>p</mi>\\n <mo>=</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>0037</mn>\\n </mrow>\\n <annotation>$$ p=0.0037 $$</annotation>\\n </semantics></math>, and <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>η</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n <mo>=</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>213</mn>\\n </mrow>\\n <annotation>$$ {\\\\eta}^2=0.213 $$</annotation>\\n </semantics></math>. Disaggregated analysis shows that positivity ratios differ by prompt (<span></span><math>\\n <semantics>\\n <mrow>\\n <mi>F</mi>\\n <mo>(</mo>\\n <mn>9</mn>\\n <mo>,</mo>\\n <mn>120</mn>\\n <mo>)</mo>\\n <mo>=</mo>\\n <mn>3</mn>\\n <mo>.</mo>\\n <mn>621</mn>\\n </mrow>\\n <annotation>$$ F\\\\left(9,120\\\\right)=3.621 $$</annotation>\\n </semantics></math>, <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>p</mi>\\n <mo>=</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>0005</mn>\\n </mrow>\\n <annotation>$$ p=0.0005 $$</annotation>\\n </semantics></math>, <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>η</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n <mo>=</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>214</mn>\\n </mrow>\\n <annotation>$$ {\\\\eta}^2=0.214 $$</annotation>\\n </semantics></math>), while negativity scores display even greater divergence across prompts with <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>F</mi>\\n <mo>(</mo>\\n <mn>9</mn>\\n <mo>,</mo>\\n <mn>120</mn>\\n <mo>)</mo>\\n <mo>=</mo>\\n <mn>12</mn>\\n <mo>.</mo>\\n <mn>755</mn>\\n </mrow>\\n <annotation>$$ F\\\\left(9,120\\\\right)=12.755 $$</annotation>\\n </semantics></math>, <span></span><math>\\n <semantics>\\n <mrow>\\n <mi>p</mi>\\n <mo>=</mo>\\n <mn>4</mn>\\n <mo>.</mo>\\n <mn>59</mn>\\n <mo>×</mo>\\n <mn>1</mn>\\n <msup>\\n <mrow>\\n <mn>0</mn>\\n </mrow>\\n <mrow>\\n <mo>−</mo>\\n <mn>14</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ p=4.59\\\\times 1{0}^{-14} $$</annotation>\\n </semantics></math>, and <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>η</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n <mo>=</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>489</mn>\\n </mrow>\\n <annotation>$$ {\\\\eta}^2=0.489 $$</annotation>\\n </semantics></math>. An unsupervised clustering procedure (<span></span><math>\\n <semantics>\\n <mrow>\\n <mi>k</mi>\\n <mo>=</mo>\\n <mn>3</mn>\\n </mrow>\\n <annotation>$$ k=3 $$</annotation>\\n </semantics></math>) classifies languages into three distinct groups based on semantic alignment: (i) high-alignment (<span></span><math>\\n <semantics>\\n <mrow>\\n <mtext>mean similarity</mtext>\\n <mo>≥</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>90</mn>\\n </mrow>\\n <annotation>$$ \\\\mathrm{mean}\\\\ \\\\mathrm{similarity}\\\\ge 0.90 $$</annotation>\\n </semantics></math>), (ii) intermediate (<span></span><math>\\n <semantics>\\n <mrow>\\n <mtext>mean similarity</mtext>\\n <mo>≈</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>75</mn>\\n <mo>−</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>85</mn>\\n </mrow>\\n <annotation>$$ \\\\mathrm{mean}\\\\ \\\\mathrm{similarity}\\\\approx 0.75-0.85 $$</annotation>\\n </semantics></math>), and (iii) neutral-tone clusters. Each group exhibits distinctive polarity profiles, with median sentiment polarity ranging from <span></span><math>\\n <semantics>\\n <mrow>\\n <mo>−</mo>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>02</mn>\\n </mrow>\\n <annotation>$$ -0.02 $$</annotation>\\n </semantics></math> to <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>0</mn>\\n <mo>.</mo>\\n <mn>11</mn>\\n </mrow>\\n <annotation>$$ 0.11 $$</annotation>\\n </semantics></math>. These results demonstrate that linguistic structures exert a measurable influence on AI-generated content, underscoring the need for culturally sensitive AI design practices. These results affirm that ChatGPT-4o mini's outputs align with the linguistic relativity hypothesis, clearly illustrating that language structures significantly shape AI-driven interpretation All associated code and data are available in the GitHub repository: \\nhttps://github.com/ParthaPRay/Liguistic_Relativity_Chatgpt.</p>\\n </div>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":\"41 4\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.70103\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70103","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
引用
批量引用