Point-light Talkers: Multisensory Enhancement of Speech Tracking by Co-speech Movement Kinematics.

IF 3 3区医学 Q2 NEUROSCIENCES

Journal of Cognitive Neuroscience Pub Date : 2025-06-20 DOI:10.1162/jocn.a.62

Jacob P Momsen, Seana Coulson

{"title":"Point-light Talkers: Multisensory Enhancement of Speech Tracking by Co-speech Movement Kinematics.","authors":"Jacob P Momsen, Seana Coulson","doi":"10.1162/jocn.a.62","DOIUrl":null,"url":null,"abstract":"<p><p>While multisensory super-additivity has been demonstrated in the context of visual articulation, it is unclear whether speech and co-speech gestures are similarly subject to super-additive integration. The current study investigates multisensory integration of speech and bodily gestures, testing whether biological motion signatures of co-speech gestures enhance cortical tracking of the speech envelope. We recorded EEG from 20 healthy adults as they watched a series of multimodal discourse clips from four conditions: AV congruent clips with co-speech gestures that were naturally aligned with speech, AV incongruent clips in which gestures were not aligned with the speech, audio-only clips in which speech was delivered in isolation, and video-only clips presenting the gesture content with no accompanying speech. As we hypothesize that the kinematics of co-speech gestures are sufficient to drive gestural enhancement of speech, our clips employed minimalistic \"point-light\" depictions of a speaker's movements: point-light talkers. Using neural decoder models to predict the amplitude of the speech envelope from EEG elicited in all four conditions, we compared speech reconstruction performance between multisensory (AV congruent) and additive models, that is, those representing the summed neural response across the two unisensory conditions. We found significant improvement in decoder scores for models trained on AV congruent trials relative to both audio-only and additive models. Forward models of brain activity indicated signatures of multisensory integration 140-160 msec following changes to the speech envelope. These results provide novel evidence for a multisensory enhancement effect of co-speech gesture kinematics on continuous speech tracking.</p>","PeriodicalId":51081,"journal":{"name":"Journal of Cognitive Neuroscience","volume":" ","pages":"1-16"},"PeriodicalIF":3.0000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cognitive Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1162/jocn.a.62","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

While multisensory super-additivity has been demonstrated in the context of visual articulation, it is unclear whether speech and co-speech gestures are similarly subject to super-additive integration. The current study investigates multisensory integration of speech and bodily gestures, testing whether biological motion signatures of co-speech gestures enhance cortical tracking of the speech envelope. We recorded EEG from 20 healthy adults as they watched a series of multimodal discourse clips from four conditions: AV congruent clips with co-speech gestures that were naturally aligned with speech, AV incongruent clips in which gestures were not aligned with the speech, audio-only clips in which speech was delivered in isolation, and video-only clips presenting the gesture content with no accompanying speech. As we hypothesize that the kinematics of co-speech gestures are sufficient to drive gestural enhancement of speech, our clips employed minimalistic "point-light" depictions of a speaker's movements: point-light talkers. Using neural decoder models to predict the amplitude of the speech envelope from EEG elicited in all four conditions, we compared speech reconstruction performance between multisensory (AV congruent) and additive models, that is, those representing the summed neural response across the two unisensory conditions. We found significant improvement in decoder scores for models trained on AV congruent trials relative to both audio-only and additive models. Forward models of brain activity indicated signatures of multisensory integration 140-160 msec following changes to the speech envelope. These results provide novel evidence for a multisensory enhancement effect of co-speech gesture kinematics on continuous speech tracking.

查看原文本刊更多论文

点光说话者：用共语音运动运动学增强语音跟踪的多感官。

虽然多感官超加性已经在视觉发音的背景下得到了证明，但语音和共语音手势是否同样受到超加性整合的影响尚不清楚。目前的研究调查了语言和身体手势的多感官整合，测试了共同语言手势的生物运动特征是否增强了皮层对语言包络的跟踪。我们记录了20名健康成人在观看一系列四种情况下的多模态话语片段时的脑电图：AV一致的片段与共同语言的手势自然对齐，AV不一致的片段与手势不对齐，语音单独传递的音频片段，以及只呈现手势内容而不伴随语音的视频片段。由于我们假设共语手势的运动学足以驱动语音的手势增强，我们的剪辑采用了极简主义的“点光”描述说话者的动作：点光谈话者。使用神经解码器模型来预测四种情况下脑电图的语音包络振幅，我们比较了多感觉（AV一致）和加性模型（即代表两种无感觉情况下的神经反应总和的模型）之间的语音重建性能。我们发现，相对于纯音频模型和添加模型，在AV一致性试验上训练的模型的解码器分数有显著提高。大脑活动的正向模型表明，在言语包络变化后的140-160毫秒内，会出现多感觉整合的特征。这些结果为同语音手势运动学在连续语音跟踪中的多感官增强效应提供了新的证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊