Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du
{"title":"增强增强现实中的音频感知:动态人声信息处理框架","authors":"Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du","doi":"10.1117/12.3014440","DOIUrl":null,"url":null,"abstract":"The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 22","pages":"129691Z - 129691Z-9"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing audio perception in augmented reality: a dynamic vocal information processing framework\",\"authors\":\"Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du\",\"doi\":\"10.1117/12.3014440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.\",\"PeriodicalId\":516634,\"journal\":{\"name\":\"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)\",\"volume\":\" 22\",\"pages\":\"129691Z - 129691Z-9\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3014440\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3014440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing audio perception in augmented reality: a dynamic vocal information processing framework
The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.