Subword representations successfully decode brain responses to morphologically complex written words

IF 3.6 Q1 LINGUISTICS
Tero Hakala, Tiina Lindh-Knuutila, Annika Hultén, Minna Lehtonen, R. Salmelin
{"title":"Subword representations successfully decode brain responses to morphologically complex written words","authors":"Tero Hakala, Tiina Lindh-Knuutila, Annika Hultén, Minna Lehtonen, R. Salmelin","doi":"10.1162/nol_a_00149","DOIUrl":null,"url":null,"abstract":"\n This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350--500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.","PeriodicalId":34845,"journal":{"name":"Neurobiology of Language","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurobiology of Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/nol_a_00149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350--500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.
子词表征成功解码大脑对形态复杂的书面词语的反应
这项研究将利用语料-语义向量空间对单词诱发的大脑激活进行解码的想法扩展到了多词素芬兰语中。语料库-语义模型在词段上进行训练,解码则通过由这些词段组成的词向量进行。我们使用不同的词段测试了几种可供选择的向量空间模型:无词段(整词)、语言词素、统计词素、随机词段以及字符级 1-、2-和 3-词素,并将它们与视觉单词识别任务中记录的多词素单词的 MEG 反应配对。对于所有变体,在刺激开始后350-500毫秒时,解码准确率都超过了基于词标签排列的标准显著性阈值。然而,临界词段标签排列测试表明,只有那些具有形态意识的词段才能在大脑解码任务中达到显著性。这些结果表明,全词形式和词素都能在大脑中得到表征,并表明使用由组成性子词片段衍生的语料库-语义词表征进行神经解码也适用于多词素词形式。这对于形态复杂的语言尤为重要,因为很大一部分词形是罕见的,很难在任何大型语料库中找到统计上可靠的表面表征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurobiology of Language
Neurobiology of Language Social Sciences-Linguistics and Language
CiteScore
5.90
自引率
6.20%
发文量
32
审稿时长
17 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信