Carina Kauf, Greta Tuckute, Roger P. Levy, Jacob Andreas, Evelina Fedorenko
{"title":"词汇-语义内容,而不是句法结构,是语言网络中fMRI反应的ANN-Brain相似性的主要贡献者","authors":"Carina Kauf, Greta Tuckute, Roger P. Levy, Jacob Andreas, Evelina Fedorenko","doi":"10.1162/nol_a_00116","DOIUrl":null,"url":null,"abstract":"Abstract Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences’ word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence’s syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN’s embedding space and decrease the ANN’s ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result—that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones—aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.","PeriodicalId":34845,"journal":{"name":"Neurobiology of Language","volume":"31 1","pages":"0"},"PeriodicalIF":3.6000,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network\",\"authors\":\"Carina Kauf, Greta Tuckute, Roger P. Levy, Jacob Andreas, Evelina Fedorenko\",\"doi\":\"10.1162/nol_a_00116\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences’ word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence’s syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN’s embedding space and decrease the ANN’s ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result—that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones—aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.\",\"PeriodicalId\":34845,\"journal\":{\"name\":\"Neurobiology of Language\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2023-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurobiology of Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1162/nol_a_00116\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurobiology of Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/nol_a_00116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 2
摘要
人工神经网络(ANN)语言模型的表征已被证明可以在语言网络中预测人脑活动。为了了解语言刺激的哪些方面有助于神经网络与大脑的相似性,我们使用了对n = 627个自然英语句子(Pereira et al., 2018)的反应的功能磁共振成像数据集,并系统地操纵了提取神经网络表征的刺激。特别是,我们(i)扰乱句子的词序,(ii)删除不同的词子集,或(iii)用语义相似度不同的其他句子替换句子。我们发现,句子的词汇语义内容(主要由实词承载)而不是句子的句法形式(通过词序或虚词传达)是人工神经网络与大脑相似度的主要原因。在后续分析中,我们发现对大脑预测产生不利影响的扰动操作也会导致人工神经网络嵌入空间中出现更多不同的表征,并降低人工神经网络预测这些刺激中即将到来的标记的能力。此外,对于映射模型是在完整的还是扰动的刺激上训练的,以及人工神经网络的句子表示是否以人类看到的相同的语言语境为条件,结果是鲁棒的。关键的结果——词汇语义内容是人工神经网络表示和神经网络表示之间相似性的主要贡献者——与人类语言系统的目标是从语言字符串中提取意义的想法一致。最后,这项工作强调了系统实验操作的强度,以评估我们离人类语言网络的准确和可推广模型有多近。
Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network
Abstract Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences’ word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence’s syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN’s embedding space and decrease the ANN’s ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result—that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones—aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.