Toward 3D scene understanding via audio-description: Kinect-iPad fusion for the visually impaired

The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility Pub Date : 2011-10-24 DOI:10.1145/2049536.2049613

J. D. Gomez, Sinan Mohammed, G. Bologna, T. Pun

{"title":"Toward 3D scene understanding via audio-description: Kinect-iPad fusion for the visually impaired","authors":"J. D. Gomez, Sinan Mohammed, G. Bologna, T. Pun","doi":"10.1145/2049536.2049613","DOIUrl":null,"url":null,"abstract":"Microsoft's Kinect 3-D motion sensor is a low cost 3D camera that provides color and depth information of indoor environments. In this demonstration, the functionality of this fun-only camera accompanied by an iPad's tangible interface is targeted to the benefit of the visually impaired. A computer-vision-based framework for real time objects localization and for their audio description is introduced. Firstly, objects are extracted from the scene and recognized using feature descriptors and machine-learning. Secondly, the recognized objects are labeled by instruments sounds, whereas their position in 3D space is described by virtual space sources of sound. As a result, the scene can be heard and explored while finger-triggering the sounds within the iPad, on which a top-view of the objects is mapped. This enables blindfolded users to build a mental occupancy grid of the environment. The approach presented here brings the promise of efficient assistance and could be adapted as an electronic travel aid for the visually-impaired in the near future.","PeriodicalId":351090,"journal":{"name":"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2049536.2049613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Microsoft's Kinect 3-D motion sensor is a low cost 3D camera that provides color and depth information of indoor environments. In this demonstration, the functionality of this fun-only camera accompanied by an iPad's tangible interface is targeted to the benefit of the visually impaired. A computer-vision-based framework for real time objects localization and for their audio description is introduced. Firstly, objects are extracted from the scene and recognized using feature descriptors and machine-learning. Secondly, the recognized objects are labeled by instruments sounds, whereas their position in 3D space is described by virtual space sources of sound. As a result, the scene can be heard and explored while finger-triggering the sounds within the iPad, on which a top-view of the objects is mapped. This enables blindfolded users to build a mental occupancy grid of the environment. The approach presented here brings the promise of efficient assistance and could be adapted as an electronic travel aid for the visually-impaired in the near future.

查看原文本刊更多论文

通过音频描述来理解3D场景:针对视障人士的Kinect-iPad融合

微软的Kinect 3D运动传感器是一种低成本的3D相机，可以提供室内环境的颜色和深度信息。在这个演示中，这个只有乐趣的相机的功能伴随着iPad的有形界面，是针对视障人士的利益。介绍了一种基于计算机视觉的实时目标定位及其音频描述框架。首先，从场景中提取物体并使用特征描述符和机器学习进行识别。其次，用乐器声音来标记被识别的物体，用虚拟空间声源来描述它们在三维空间中的位置。因此，当手指在iPad上触发声音时，可以听到和探索场景，并在上面映射出物体的俯视图。这使得蒙着眼睛的用户能够建立一个环境的心理占用网格。这里提出的方法带来了有效援助的希望，并可以在不久的将来改编为视障人士的电子旅行辅助工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility

自引率

0.00%

发文量