Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network

IF 3.6 Q1 LINGUISTICS
Carina Kauf, Greta Tuckute, Roger P. Levy, Jacob Andreas, Evelina Fedorenko
{"title":"Lexical-Semantic Content, Not Syntactic Structure, Is the Main Contributor to ANN-Brain Similarity of fMRI Responses in the Language Network","authors":"Carina Kauf, Greta Tuckute, Roger P. Levy, Jacob Andreas, Evelina Fedorenko","doi":"10.1162/nol_a_00116","DOIUrl":null,"url":null,"abstract":"Abstract Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences’ word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence’s syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN’s embedding space and decrease the ANN’s ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result—that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones—aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.","PeriodicalId":34845,"journal":{"name":"Neurobiology of Language","volume":"31 1","pages":"0"},"PeriodicalIF":3.6000,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurobiology of Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/nol_a_00116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 2

Abstract

Abstract Representations from artificial neural network (ANN) language models have been shown to predict human brain activity in the language network. To understand what aspects of linguistic stimuli contribute to ANN-to-brain similarity, we used an fMRI data set of responses to n = 627 naturalistic English sentences (Pereira et al., 2018) and systematically manipulated the stimuli for which ANN representations were extracted. In particular, we (i) perturbed sentences’ word order, (ii) removed different subsets of words, or (iii) replaced sentences with other sentences of varying semantic similarity. We found that the lexical-semantic content of the sentence (largely carried by content words) rather than the sentence’s syntactic form (conveyed via word order or function words) is primarily responsible for the ANN-to-brain similarity. In follow-up analyses, we found that perturbation manipulations that adversely affect brain predictivity also lead to more divergent representations in the ANN’s embedding space and decrease the ANN’s ability to predict upcoming tokens in those stimuli. Further, results are robust as to whether the mapping model is trained on intact or perturbed stimuli and whether the ANN sentence representations are conditioned on the same linguistic context that humans saw. The critical result—that lexical-semantic content is the main contributor to the similarity between ANN representations and neural ones—aligns with the idea that the goal of the human language system is to extract meaning from linguistic strings. Finally, this work highlights the strength of systematic experimental manipulations for evaluating how close we are to accurate and generalizable models of the human language network.
词汇-语义内容,而不是句法结构,是语言网络中fMRI反应的ANN-Brain相似性的主要贡献者
人工神经网络(ANN)语言模型的表征已被证明可以在语言网络中预测人脑活动。为了了解语言刺激的哪些方面有助于神经网络与大脑的相似性,我们使用了对n = 627个自然英语句子(Pereira et al., 2018)的反应的功能磁共振成像数据集,并系统地操纵了提取神经网络表征的刺激。特别是,我们(i)扰乱句子的词序,(ii)删除不同的词子集,或(iii)用语义相似度不同的其他句子替换句子。我们发现,句子的词汇语义内容(主要由实词承载)而不是句子的句法形式(通过词序或虚词传达)是人工神经网络与大脑相似度的主要原因。在后续分析中,我们发现对大脑预测产生不利影响的扰动操作也会导致人工神经网络嵌入空间中出现更多不同的表征,并降低人工神经网络预测这些刺激中即将到来的标记的能力。此外,对于映射模型是在完整的还是扰动的刺激上训练的,以及人工神经网络的句子表示是否以人类看到的相同的语言语境为条件,结果是鲁棒的。关键的结果——词汇语义内容是人工神经网络表示和神经网络表示之间相似性的主要贡献者——与人类语言系统的目标是从语言字符串中提取意义的想法一致。最后,这项工作强调了系统实验操作的强度,以评估我们离人类语言网络的准确和可推广模型有多近。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurobiology of Language
Neurobiology of Language Social Sciences-Linguistics and Language
CiteScore
5.90
自引率
6.20%
发文量
32
审稿时长
17 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信