Maansi Desai, Alyssa M. Field, Liberty S. Hamilton
{"title":"使用视听刺激及其单模态对应物的脑电图编码模型比较","authors":"Maansi Desai, Alyssa M. Field, Liberty S. Hamilton","doi":"10.1371/journal.pcbi.1012433","DOIUrl":null,"url":null,"abstract":"Communication in the real world is inherently multimodal. When having a conversation, typically sighted and hearing people use both auditory and visual cues to understand one another. For example, objects may make sounds as they move in space, or we may use the movement of a person’s mouth to better understand what they are saying in a noisy environment. Still, many neuroscience experiments rely on unimodal stimuli to understand encoding of sensory features in the brain. The extent to which visual information may influence encoding of auditory information and vice versa in natural environments is thus unclear. Here, we addressed this question by recording scalp electroencephalography (EEG) in 11 subjects as they listened to and watched movie trailers in audiovisual (AV), visual (V) only, and audio (A) only conditions. We then fit linear encoding models that described the relationship between the brain responses and the acoustic, phonetic, and visual information in the stimuli. We also compared whether auditory and visual feature tuning was the same when stimuli were presented in the original AV format versus when visual or auditory information was removed. In these stimuli, visual and auditory information was relatively uncorrelated, and included spoken narration over a scene as well as animated or live-action characters talking with and without their face visible. For this stimulus, we found that auditory feature tuning was similar in the AV and A-only conditions, and similarly, tuning for visual information was similar when stimuli were presented with the audio present (AV) and when the audio was removed (V only). In a cross prediction analysis, we investigated whether models trained on AV data predicted responses to A or V only test data similarly to models trained on unimodal data. Overall, prediction performance using AV training and V test sets was similar to using V training and V test sets, suggesting that the auditory information has a relatively smaller effect on EEG. In contrast, prediction performance using AV training and A only test set was slightly worse than using matching A only training and A only test sets. This suggests the visual information has a stronger influence on EEG, though this makes no qualitative difference in the derived feature tuning. In effect, our results show that researchers may benefit from the richness of multimodal datasets, which can then be used to answer more than one research question.","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts\",\"authors\":\"Maansi Desai, Alyssa M. Field, Liberty S. Hamilton\",\"doi\":\"10.1371/journal.pcbi.1012433\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Communication in the real world is inherently multimodal. When having a conversation, typically sighted and hearing people use both auditory and visual cues to understand one another. For example, objects may make sounds as they move in space, or we may use the movement of a person’s mouth to better understand what they are saying in a noisy environment. Still, many neuroscience experiments rely on unimodal stimuli to understand encoding of sensory features in the brain. The extent to which visual information may influence encoding of auditory information and vice versa in natural environments is thus unclear. Here, we addressed this question by recording scalp electroencephalography (EEG) in 11 subjects as they listened to and watched movie trailers in audiovisual (AV), visual (V) only, and audio (A) only conditions. We then fit linear encoding models that described the relationship between the brain responses and the acoustic, phonetic, and visual information in the stimuli. We also compared whether auditory and visual feature tuning was the same when stimuli were presented in the original AV format versus when visual or auditory information was removed. In these stimuli, visual and auditory information was relatively uncorrelated, and included spoken narration over a scene as well as animated or live-action characters talking with and without their face visible. For this stimulus, we found that auditory feature tuning was similar in the AV and A-only conditions, and similarly, tuning for visual information was similar when stimuli were presented with the audio present (AV) and when the audio was removed (V only). In a cross prediction analysis, we investigated whether models trained on AV data predicted responses to A or V only test data similarly to models trained on unimodal data. Overall, prediction performance using AV training and V test sets was similar to using V training and V test sets, suggesting that the auditory information has a relatively smaller effect on EEG. In contrast, prediction performance using AV training and A only test set was slightly worse than using matching A only training and A only test sets. This suggests the visual information has a stronger influence on EEG, though this makes no qualitative difference in the derived feature tuning. In effect, our results show that researchers may benefit from the richness of multimodal datasets, which can then be used to answer more than one research question.\",\"PeriodicalId\":20241,\"journal\":{\"name\":\"PLoS Computational Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pcbi.1012433\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1012433","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
现实世界中的交流本来就是多模态的。在交谈时,视力正常的人和听力正常的人通常会同时使用听觉和视觉线索来理解对方。例如,物体在空间移动时可能会发出声音,或者我们可以通过一个人嘴巴的动作来更好地理解他在嘈杂环境中所说的话。尽管如此,许多神经科学实验仍然依赖于单模态刺激来了解大脑对感官特征的编码。因此,在自然环境中,视觉信息在多大程度上影响听觉信息的编码,反之亦然,这一点尚不清楚。在此,我们通过记录 11 名受试者在视听(AV)、仅视觉(V)和仅音频(A)条件下收听和观看电影预告片时的头皮脑电图(EEG)来解决这一问题。然后,我们拟合了线性编码模型,描述了大脑反应与刺激物中的声音、语音和视觉信息之间的关系。我们还比较了当刺激以原始 AV 格式呈现时,听觉和视觉特征调谐是否相同,以及当视觉或听觉信息被移除时,听觉和视觉特征调谐是否相同。在这些刺激中,视觉和听觉信息相对不相关,包括场景中的口语叙述以及动画或真人角色的面部可见或不可见的谈话。对于这种刺激,我们发现听觉特征调谐在有音频和无音频条件下相似,同样,当刺激出现音频(有音频)和去掉音频(无音频)时,视觉信息的调谐也相似。在交叉预测分析中,我们研究了在视听数据上训练的模型是否与在单模态数据上训练的模型相似,都能预测出对A或V测试数据的反应。总体而言,使用 AV 训练和 V 测试集的预测性能与使用 V 训练和 V 测试集的预测性能相似,这表明听觉信息对脑电图的影响相对较小。相比之下,使用 AV 训练集和仅 A 测试集的预测性能略差于使用匹配的仅 A 训练集和仅 A 测试集的预测性能。这表明视觉信息对脑电图的影响更大,尽管这对推导出的特征调谐没有本质区别。实际上,我们的研究结果表明,研究人员可以从丰富的多模态数据集中获益,这些数据集可以用来回答不止一个研究问题。
A comparison of EEG encoding models using audiovisual stimuli and their unimodal counterparts
Communication in the real world is inherently multimodal. When having a conversation, typically sighted and hearing people use both auditory and visual cues to understand one another. For example, objects may make sounds as they move in space, or we may use the movement of a person’s mouth to better understand what they are saying in a noisy environment. Still, many neuroscience experiments rely on unimodal stimuli to understand encoding of sensory features in the brain. The extent to which visual information may influence encoding of auditory information and vice versa in natural environments is thus unclear. Here, we addressed this question by recording scalp electroencephalography (EEG) in 11 subjects as they listened to and watched movie trailers in audiovisual (AV), visual (V) only, and audio (A) only conditions. We then fit linear encoding models that described the relationship between the brain responses and the acoustic, phonetic, and visual information in the stimuli. We also compared whether auditory and visual feature tuning was the same when stimuli were presented in the original AV format versus when visual or auditory information was removed. In these stimuli, visual and auditory information was relatively uncorrelated, and included spoken narration over a scene as well as animated or live-action characters talking with and without their face visible. For this stimulus, we found that auditory feature tuning was similar in the AV and A-only conditions, and similarly, tuning for visual information was similar when stimuli were presented with the audio present (AV) and when the audio was removed (V only). In a cross prediction analysis, we investigated whether models trained on AV data predicted responses to A or V only test data similarly to models trained on unimodal data. Overall, prediction performance using AV training and V test sets was similar to using V training and V test sets, suggesting that the auditory information has a relatively smaller effect on EEG. In contrast, prediction performance using AV training and A only test set was slightly worse than using matching A only training and A only test sets. This suggests the visual information has a stronger influence on EEG, though this makes no qualitative difference in the derived feature tuning. In effect, our results show that researchers may benefit from the richness of multimodal datasets, which can then be used to answer more than one research question.
期刊介绍:
PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery.
Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines.
Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights.
Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology.
Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.