融合混淆:利用McGurk效应探索视听沉浸的双声空间定位

Abubakr Siddig, Alessandro Ragano, Hamed Z. Jahromi, Andrew Hines
{"title":"融合混淆:利用McGurk效应探索视听沉浸的双声空间定位","authors":"Abubakr Siddig, Alessandro Ragano, Hamed Z. Jahromi, Andrew Hines","doi":"10.1145/3304113.3326112","DOIUrl":null,"url":null,"abstract":"Virtual Reality (VR) is attracting the attention of application developers for purposes beyond entertainment including serious games, health, education and training. By including 3D audio the overall VR quality of experience (QoE) will be enhanced through greater immersion. Better understanding the perception of spatial audio localisation in audio-visual immersion is needed especially in streaming applications where bandwidth is limited and compression is required. This paper explores the impact of audio-visual fusion on speech due to mismatches in a perceived talker location and the corresponding sound using a phenomenon known as the McGurk effect and binaurally rendered Ambisonic spatial audio. The illusion of the McGurk effect happens when a sound of a syllable paired with a video of a second syllable, gives the perception of a third syllable. For instance the sound of /ba/ dubbed in video of /ga/ will lead to the illusion of hearing /da/. Several studies investigated factors involved in the McGurk effect, but a little has been done to understand the audio spatial effect on this illusion. 3D spatial audio generated with Ambisonics has been shown to provide satisfactory QoE with respect to localisation of sound sources which makes it suitable for VR applications but not for audio visual talker scenarios. In order to test the perception of the McGurk effect at different direction of arrival (DOA) of sound, we rendered Ambisonics signals at the azimuth of 0°, 30°, 60°, and 90° to both the left and right of the video source. The results show that the audio visual fusion significantly affects the perception of the speech. Yet the spatial audio does not significantly impact the illusion. This finding suggests that precise localisation of speech audio might not be as critical for speech intelligibility. It was found that a more significant factor was the intelligibility of speech itself.","PeriodicalId":377364,"journal":{"name":"Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Fusion confusion: exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect\",\"authors\":\"Abubakr Siddig, Alessandro Ragano, Hamed Z. Jahromi, Andrew Hines\",\"doi\":\"10.1145/3304113.3326112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Virtual Reality (VR) is attracting the attention of application developers for purposes beyond entertainment including serious games, health, education and training. By including 3D audio the overall VR quality of experience (QoE) will be enhanced through greater immersion. Better understanding the perception of spatial audio localisation in audio-visual immersion is needed especially in streaming applications where bandwidth is limited and compression is required. This paper explores the impact of audio-visual fusion on speech due to mismatches in a perceived talker location and the corresponding sound using a phenomenon known as the McGurk effect and binaurally rendered Ambisonic spatial audio. The illusion of the McGurk effect happens when a sound of a syllable paired with a video of a second syllable, gives the perception of a third syllable. For instance the sound of /ba/ dubbed in video of /ga/ will lead to the illusion of hearing /da/. Several studies investigated factors involved in the McGurk effect, but a little has been done to understand the audio spatial effect on this illusion. 3D spatial audio generated with Ambisonics has been shown to provide satisfactory QoE with respect to localisation of sound sources which makes it suitable for VR applications but not for audio visual talker scenarios. In order to test the perception of the McGurk effect at different direction of arrival (DOA) of sound, we rendered Ambisonics signals at the azimuth of 0°, 30°, 60°, and 90° to both the left and right of the video source. The results show that the audio visual fusion significantly affects the perception of the speech. Yet the spatial audio does not significantly impact the illusion. This finding suggests that precise localisation of speech audio might not be as critical for speech intelligibility. It was found that a more significant factor was the intelligibility of speech itself.\",\"PeriodicalId\":377364,\"journal\":{\"name\":\"Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3304113.3326112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3304113.3326112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

虚拟现实(VR)正在吸引应用程序开发人员的注意,其目的不仅限于娱乐,还包括严肃游戏、健康、教育和培训。通过加入3D音频,整体VR体验质量(QoE)将通过更大的沉浸感得到提升。在视听沉浸中,我们需要更好地理解空间音频定位的感知,特别是在带宽有限且需要压缩的流媒体应用中。本文利用McGurk效应和双耳呈现的双声空间音频,探讨了由于感知到的说话者位置和相应声音不匹配而导致的视听融合对语音的影响。当一个音节的声音与第二个音节的视频配对时,会产生第三个音节的感觉,这就产生了麦格克效应的错觉。例如,在/ga/视频中配音的/ba/声音会导致听到/da/的错觉。有几项研究调查了与McGurk效应有关的因素,但对这种错觉的音频空间效应的理解却很少。使用立体声生成的3D空间音频已被证明可以提供令人满意的QoE,这使得它适用于VR应用程序,但不适用于视听谈话者场景。为了测试声音在不同到达方向(DOA)时的McGurk效应的感知,我们在视频源的左右方向分别为0°、30°、60°和90°的方位角处渲染了立体声信号。结果表明,视听融合对语音感知有显著影响。然而,空间音频并没有显著影响错觉。这一发现表明,语音音频的精确定位可能对语音的可理解性并不那么重要。研究发现,一个更重要的因素是言语本身的可理解性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fusion confusion: exploring ambisonic spatial localisation for audio-visual immersion using the McGurk effect
Virtual Reality (VR) is attracting the attention of application developers for purposes beyond entertainment including serious games, health, education and training. By including 3D audio the overall VR quality of experience (QoE) will be enhanced through greater immersion. Better understanding the perception of spatial audio localisation in audio-visual immersion is needed especially in streaming applications where bandwidth is limited and compression is required. This paper explores the impact of audio-visual fusion on speech due to mismatches in a perceived talker location and the corresponding sound using a phenomenon known as the McGurk effect and binaurally rendered Ambisonic spatial audio. The illusion of the McGurk effect happens when a sound of a syllable paired with a video of a second syllable, gives the perception of a third syllable. For instance the sound of /ba/ dubbed in video of /ga/ will lead to the illusion of hearing /da/. Several studies investigated factors involved in the McGurk effect, but a little has been done to understand the audio spatial effect on this illusion. 3D spatial audio generated with Ambisonics has been shown to provide satisfactory QoE with respect to localisation of sound sources which makes it suitable for VR applications but not for audio visual talker scenarios. In order to test the perception of the McGurk effect at different direction of arrival (DOA) of sound, we rendered Ambisonics signals at the azimuth of 0°, 30°, 60°, and 90° to both the left and right of the video source. The results show that the audio visual fusion significantly affects the perception of the speech. Yet the spatial audio does not significantly impact the illusion. This finding suggests that precise localisation of speech audio might not be as critical for speech intelligibility. It was found that a more significant factor was the intelligibility of speech itself.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信