增强增强现实中的音频感知:动态人声信息处理框架

Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du
{"title":"增强增强现实中的音频感知:动态人声信息处理框架","authors":"Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du","doi":"10.1117/12.3014440","DOIUrl":null,"url":null,"abstract":"The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 22","pages":"129691Z - 129691Z-9"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing audio perception in augmented reality: a dynamic vocal information processing framework\",\"authors\":\"Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du\",\"doi\":\"10.1117/12.3014440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.\",\"PeriodicalId\":516634,\"journal\":{\"name\":\"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)\",\"volume\":\" 22\",\"pages\":\"129691Z - 129691Z-9\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3014440\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3014440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

如今,元宇宙的发展引发了研究者们的广泛关注,相应地也衍生出许多技术来改善人类在元宇宙中的现实感。尤其是扩展现实技术(Extended Reality,XR),作为元宇宙研究中不可或缺的重要技术和研究方向,旨在实现虚拟世界与现实世界之间的无缝转换,让人们身临其境地体验世界。然而,我们目前缺乏的技术是同时分离、分类和定位人类动态声音信息的能力,以增强人类在复杂噪声环境中的声音感知能力。本文提出的框架利用 FCNN 进行分离,利用代数模型进行定位以获得估计距离,并利用 SVM 进行分类。建立的数据集模拟了与距离相关的变化,并带有准确的地面实况标签。结果表明,我们的方法可以有效地分离、分隔和定位混合声音数据,为用户提供复杂声音环境中说话对象的内容、性别和距离等综合信息,增强用户的沉浸式体验和感知能力。我们的创新之处在于结合了三种音频处理技术,所提出的框架很可能会对未来相关课题的研究有所启发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing audio perception in augmented reality: a dynamic vocal information processing framework
The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信