Neural Tracking of Speech Acoustics in Noise Is Coupled with Lexical Predictability as Estimated by Large Language Models.

IF 2.7 3区 医学 Q3 NEUROSCIENCES
eNeuro Pub Date : 2024-08-20 Print Date: 2024-08-01 DOI:10.1523/ENEURO.0507-23.2024
Paul Iverson, Jieun Song
{"title":"Neural Tracking of Speech Acoustics in Noise Is Coupled with Lexical Predictability as Estimated by Large Language Models.","authors":"Paul Iverson, Jieun Song","doi":"10.1523/ENEURO.0507-23.2024","DOIUrl":null,"url":null,"abstract":"<p><p>Adults heard recordings of two spatially separated speakers reading newspaper and magazine articles. They were asked to listen to one of them and ignore the other, and EEG was recorded to assess their neural processing. Machine learning extracted neural sources that tracked the target and distractor speakers at three levels: the acoustic envelope of speech (delta- and theta-band modulations), lexical frequency for individual words, and the contextual predictability of individual words estimated by GPT-4 and earlier lexical models. To provide a broader view of speech perception, half of the subjects completed a simultaneous visual task, and the listeners included both native and non-native English speakers. Distinct neural components were extracted for these levels of auditory and lexical processing, demonstrating that native English speakers had greater target-distractor separation compared with non-native English speakers on most measures, and that lexical processing was reduced by the visual task. Moreover, there was a novel interaction of lexical predictability and frequency with auditory processing; acoustic tracking was stronger for lexically harder words, suggesting that people listened harder to the acoustics when needed for lexical selection. This demonstrates that speech perception is not simply a feedforward process from acoustic processing to the lexicon. Rather, the adaptable context-sensitive processing long known to occur at a lexical level has broader consequences for perception, coupling with the acoustic tracking of individual speakers in noise.</p>","PeriodicalId":11617,"journal":{"name":"eNeuro","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11335968/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"eNeuro","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1523/ENEURO.0507-23.2024","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/1 0:00:00","PubModel":"Print","JCR":"Q3","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Adults heard recordings of two spatially separated speakers reading newspaper and magazine articles. They were asked to listen to one of them and ignore the other, and EEG was recorded to assess their neural processing. Machine learning extracted neural sources that tracked the target and distractor speakers at three levels: the acoustic envelope of speech (delta- and theta-band modulations), lexical frequency for individual words, and the contextual predictability of individual words estimated by GPT-4 and earlier lexical models. To provide a broader view of speech perception, half of the subjects completed a simultaneous visual task, and the listeners included both native and non-native English speakers. Distinct neural components were extracted for these levels of auditory and lexical processing, demonstrating that native English speakers had greater target-distractor separation compared with non-native English speakers on most measures, and that lexical processing was reduced by the visual task. Moreover, there was a novel interaction of lexical predictability and frequency with auditory processing; acoustic tracking was stronger for lexically harder words, suggesting that people listened harder to the acoustics when needed for lexical selection. This demonstrates that speech perception is not simply a feedforward process from acoustic processing to the lexicon. Rather, the adaptable context-sensitive processing long known to occur at a lexical level has broader consequences for perception, coupling with the acoustic tracking of individual speakers in noise.

噪音中语音声学的神经跟踪与大型语言模型估计的词汇可预测性相结合。
成人聆听了两个在空间上分开的演讲者朗读报纸和杂志文章的录音。他们被要求聆听其中一位,忽略另一位,并记录脑电图以评估他们的神经处理过程。机器学习从三个层面提取了追踪目标和分心发言者的神经源:语音的声包络(delta 波段和 theta 波段调制)、单个词的词频以及由 GPT-4 和早期词法模型估算的单个词的上下文可预测性。为了更广泛地了解语音感知,一半的受试者完成了同步视觉任务,听者包括母语为英语和非母语为英语的人。研究提取了听觉和词汇处理两个层面的不同神经成分,结果表明,在大多数测量中,母语为英语的人比母语为非英语的人具有更强的目标--分隔符分离能力,而词汇处理能力则因视觉任务而减弱。此外,词汇的可预测性和频率与听觉处理之间存在着一种新的交互作用;对于词汇难度较大的单词,听觉跟踪作用更强,这表明人们在需要进行词汇选择时会更加努力地倾听声音。这表明,语音感知并不只是一个从声学处理到词库的前馈过程。意义声明 在具有挑战性的听力条件下,人们会利用集中注意力来帮助理解个别说话者,而忽略其他人,这改变了他们在听觉和词汇层面对语音的神经处理。然而,自然材料(如对话、有声读物等)的词汇处理一直难以测量,原因是估算较长话语中单个词汇可预测性的工具存在局限性。本研究使用当代大型语言模型 GPT-4 来估计单词的可预测性,并证明听者会根据这些预测对其听觉神经处理进行在线调整;当单词与上下文的可预测性较低时,神经活动会更紧密地跟踪目标谈话者的声音。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
eNeuro
eNeuro Neuroscience-General Neuroscience
CiteScore
5.00
自引率
2.90%
发文量
486
审稿时长
16 weeks
期刊介绍: An open-access journal from the Society for Neuroscience, eNeuro publishes high-quality, broad-based, peer-reviewed research focused solely on the field of neuroscience. eNeuro embodies an emerging scientific vision that offers a new experience for authors and readers, all in support of the Society’s mission to advance understanding of the brain and nervous system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信