A Vision Aid for the Visually Impaired using Commodity Dual-Rear-Camera Smartphones

M. Nguyen, H. Le, W. Yan, Arpita Dawda
{"title":"A Vision Aid for the Visually Impaired using Commodity Dual-Rear-Camera Smartphones","authors":"M. Nguyen, H. Le, W. Yan, Arpita Dawda","doi":"10.1109/M2VIP.2018.8600857","DOIUrl":null,"url":null,"abstract":"Dual- (or multiple) rear cameras on hand-held smartphones are believed to be the future of mobile photography. Recently, many of such new has been released (mainly with dual-rear cameras: one wide-angle and one telephoto). Some of the notable ones are Apple iPhone 7 and 8 Plus, iPhone X, Samsung Galaxy S9, LG V30, Huawei Mate 10. With built-in dual-camera systems, these devices are capable of not only producing better quality picture but also acquiring 3D stereo photos (with depth information collected). Thus, they are capable of capturing the moment in life with depth just like our two eye system. Thanks to this current trend, these phones are now getting cheaper while becoming more power complete. In this paper, we describe a system that makes use of the commercial dual rear-camera phones such as the iPhone X, to provide aids for people who are visually impaired. We propose a design to place the phone on the chest centre of the user who has one or two Bluetooth headphone(s) plugged into the ears to listen to the phone audio outputs. Our system is consist of three modules: (1) the scene context recognition to audio, (2) the 3D stereo reconstruction to audio, and (3) the interactive audio/voice controls. In slightly more detail, the wide-angle camera captures live photos to be investigated by a GPS guided Deep Learning process to describe the scene in front of him/herself (module 1). The telephoto camera captures the more narrow-angle and thus to be stereo reconstructed with the aids of the wide angle’s one to form a depth map (densed area-based distance map). The map helps determine the distance to all visible object(s) to notify the user with critical ones (module 2). This module also makes the phone vibrate when an object(s) located close enough to the user, e.g. within hand reach distance. The user can also query the system by asking various questions to get automatic voice answering (module 3). In addition, a manual rescue module (module 4) is also added when other things have gone wrong. An example of the vision to audio could be ”Overall, likely a corridor, one medium object is 0.5 m away - central left”, or ”Overall, city pathway, front cleared”. Audio command input may be ”read texts”, and the phone will detect and read all texts on closest object. More details on the design and implementation are further described in this paper.","PeriodicalId":365579,"journal":{"name":"2018 25th International Conference on Mechatronics and Machine Vision in Practice (M2VIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 25th International Conference on Mechatronics and Machine Vision in Practice (M2VIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/M2VIP.2018.8600857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Dual- (or multiple) rear cameras on hand-held smartphones are believed to be the future of mobile photography. Recently, many of such new has been released (mainly with dual-rear cameras: one wide-angle and one telephoto). Some of the notable ones are Apple iPhone 7 and 8 Plus, iPhone X, Samsung Galaxy S9, LG V30, Huawei Mate 10. With built-in dual-camera systems, these devices are capable of not only producing better quality picture but also acquiring 3D stereo photos (with depth information collected). Thus, they are capable of capturing the moment in life with depth just like our two eye system. Thanks to this current trend, these phones are now getting cheaper while becoming more power complete. In this paper, we describe a system that makes use of the commercial dual rear-camera phones such as the iPhone X, to provide aids for people who are visually impaired. We propose a design to place the phone on the chest centre of the user who has one or two Bluetooth headphone(s) plugged into the ears to listen to the phone audio outputs. Our system is consist of three modules: (1) the scene context recognition to audio, (2) the 3D stereo reconstruction to audio, and (3) the interactive audio/voice controls. In slightly more detail, the wide-angle camera captures live photos to be investigated by a GPS guided Deep Learning process to describe the scene in front of him/herself (module 1). The telephoto camera captures the more narrow-angle and thus to be stereo reconstructed with the aids of the wide angle’s one to form a depth map (densed area-based distance map). The map helps determine the distance to all visible object(s) to notify the user with critical ones (module 2). This module also makes the phone vibrate when an object(s) located close enough to the user, e.g. within hand reach distance. The user can also query the system by asking various questions to get automatic voice answering (module 3). In addition, a manual rescue module (module 4) is also added when other things have gone wrong. An example of the vision to audio could be ”Overall, likely a corridor, one medium object is 0.5 m away - central left”, or ”Overall, city pathway, front cleared”. Audio command input may be ”read texts”, and the phone will detect and read all texts on closest object. More details on the design and implementation are further described in this paper.
使用商品双后置摄像头智能手机为视障人士提供视力辅助
手持智能手机的双(或多)后置摄像头被认为是移动摄影的未来。最近,许多这样的新产品已经发布(主要是双后置摄像头:一个广角和一个长焦)。其中一些值得注意的是苹果iPhone 7和8 Plus、iPhone X、三星Galaxy S9、LG V30、华为Mate 10。这些设备内置双摄像头系统,不仅能够产生更高质量的图像,而且能够获得3D立体照片(收集深度信息)。因此,它们能够像我们的双眼系统一样,深度捕捉生活中的瞬间。由于目前的趋势,这些手机现在变得更便宜,同时变得更完整。在本文中,我们描述了一个系统,利用商用双后置摄像头手机,如iPhone X,为视障人士提供帮助。我们提出了一种设计,将手机放在用户的胸部中心,用户的耳朵上插入一个或两个蓝牙耳机,以收听电话的音频输出。我们的系统由三个模块组成:(1)场景上下文识别到音频,(2)三维立体重建到音频,(3)交互式音频/语音控制。更详细地说,广角相机捕捉现场照片,通过GPS引导的深度学习过程来描述他/她自己面前的场景(模块1)。长焦相机捕捉更窄的角度,从而在广角相机的帮助下进行立体重建,形成深度图(密集的基于区域的距离图)。地图可以帮助确定所有可见物体的距离,并将关键物体通知用户(模块2)。当物体距离用户足够近时,例如在伸手可及的距离内,该模块还会使手机振动。用户还可以通过提出各种问题对系统进行查询,获得自动语音应答(模块3)。此外,当出现其他情况时,还增加了手动救援模块(模块4)。从视觉到音频的一个例子可以是“总体而言,可能是一条走廊,中间有一个0.5米远的中等物体”,或者“总体而言,城市道路,前方畅通”。音频命令输入可能是“读取文本”,手机将检测并读取最近物体上的所有文本。本文对该系统的设计和实现进行了详细的描述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信