Ankur Mohan, R. Duraiswami, D. Zotkin, D. DeMenthon, L. Davis
{"title":"使用计算机视觉生成定制的空间音频","authors":"Ankur Mohan, R. Duraiswami, D. Zotkin, D. DeMenthon, L. Davis","doi":"10.1109/ICME.2003.1221247","DOIUrl":null,"url":null,"abstract":"Creating high quality virtual spatial audio over headphones requires real-time head tracking, personalized head-related transfer functions (HRTFs) and customized room response models. While there are expensive solutions to address these issues based on costly head trackers, measured personalized HRTFs and room responses, these are not suitable for widespread or easy deployment and use. We report on the development of a system that uses computer vision to produce customizable models for both the HRTF and the room response, and to achieve head-tracking. The system uses relatively inexpensive cameras and widely available personal computers. Computer-vision based anthropometric measurements of the head, torso, and the external ears are used for HRTF customization. For low-frequency HRTF customization we employ a simple head-and-torso model developed recently [V. R. Algazi et al., 2002]. For high frequency customization we employ measured pinna characteristics as an index into a database of HRTFs [D. N. Zotkin et al., 2002]. For head tracking we employ an online implementation of the POSIT algorithm [D. DeMenthon and L. Davis, 1995] along with active markers to compute head pose in real-time. The system provides an enhanced virtual listening experience at low cost.","PeriodicalId":118560,"journal":{"name":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Using computer vision to generate customized spatial audio\",\"authors\":\"Ankur Mohan, R. Duraiswami, D. Zotkin, D. DeMenthon, L. Davis\",\"doi\":\"10.1109/ICME.2003.1221247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Creating high quality virtual spatial audio over headphones requires real-time head tracking, personalized head-related transfer functions (HRTFs) and customized room response models. While there are expensive solutions to address these issues based on costly head trackers, measured personalized HRTFs and room responses, these are not suitable for widespread or easy deployment and use. We report on the development of a system that uses computer vision to produce customizable models for both the HRTF and the room response, and to achieve head-tracking. The system uses relatively inexpensive cameras and widely available personal computers. Computer-vision based anthropometric measurements of the head, torso, and the external ears are used for HRTF customization. For low-frequency HRTF customization we employ a simple head-and-torso model developed recently [V. R. Algazi et al., 2002]. For high frequency customization we employ measured pinna characteristics as an index into a database of HRTFs [D. N. Zotkin et al., 2002]. For head tracking we employ an online implementation of the POSIT algorithm [D. DeMenthon and L. Davis, 1995] along with active markers to compute head pose in real-time. The system provides an enhanced virtual listening experience at low cost.\",\"PeriodicalId\":118560,\"journal\":{\"name\":\"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2003.1221247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2003.1221247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
摘要
通过耳机创建高质量的虚拟空间音频需要实时头部跟踪、个性化头部相关传递函数(hrtf)和定制的房间响应模型。虽然有昂贵的解决方案来解决这些问题,基于昂贵的头部追踪器,测量个性化hrtf和房间反应,这些都不适合广泛或易于部署和使用。我们报告了一种系统的开发,该系统使用计算机视觉为HRTF和房间响应生成可定制的模型,并实现头部跟踪。该系统使用相对便宜的摄像头和广泛使用的个人电脑。基于计算机视觉的头部、躯干和外耳的人体测量用于HRTF定制。对于低频HRTF定制,我们采用了最近开发的简单头部和躯干模型[V]。R. Algazi等,2002]。对于高频定制,我们采用测量的耳廓特征作为hrtf数据库的索引[D]。N. Zotkin et al., 2002]。对于头部跟踪,我们采用了POSIT算法的在线实现[D]。DeMenthon和L. Davis, 1995]与主动标记一起实时计算头部姿势。该系统以低成本提供了增强的虚拟聆听体验。
Using computer vision to generate customized spatial audio
Creating high quality virtual spatial audio over headphones requires real-time head tracking, personalized head-related transfer functions (HRTFs) and customized room response models. While there are expensive solutions to address these issues based on costly head trackers, measured personalized HRTFs and room responses, these are not suitable for widespread or easy deployment and use. We report on the development of a system that uses computer vision to produce customizable models for both the HRTF and the room response, and to achieve head-tracking. The system uses relatively inexpensive cameras and widely available personal computers. Computer-vision based anthropometric measurements of the head, torso, and the external ears are used for HRTF customization. For low-frequency HRTF customization we employ a simple head-and-torso model developed recently [V. R. Algazi et al., 2002]. For high frequency customization we employ measured pinna characteristics as an index into a database of HRTFs [D. N. Zotkin et al., 2002]. For head tracking we employ an online implementation of the POSIT algorithm [D. DeMenthon and L. Davis, 1995] along with active markers to compute head pose in real-time. The system provides an enhanced virtual listening experience at low cost.