Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired

Shi-Yong Neo, Hai-Kiat Goh, Wendy Yen-Ni Ng, Jun-Da Ong, Wilson Pang
{"title":"Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired","authors":"Shi-Yong Neo, Hai-Kiat Goh, Wendy Yen-Ni Ng, Jun-Da Ong, Wilson Pang","doi":"10.1145/1328491.1328541","DOIUrl":null,"url":null,"abstract":"One of the common difficulties faced by the visually impaired is the inability to read and thus affecting their way of life. Existing portable reading devices (using character recognition and speech synthesis) have many limitations and poor in accuracy due to restrictive processing power. In this paper, we introduce our robust online multimedia content processing framework to alleviate the limitations of such portable devices. We leverage high transfer speed using existing wireless networks to send multimedia information captured from mobile devices to high-end processing servers and subsequently stream the desired output back to users. The resultant framework enables more complex processes as they are carried out on the servers and thus outperforms standard portable devices in terms of accuracy and functionalities. In addition, we describe a new approach to improve optical character recognition (OCR) results by using consecutive video frames for automatic character correction. Experiments using consecutive frames show an improvement in 25% accuracy over traditional OCR using a single image. The application is also trialed by several visually impaired personnel and the feedback obtained is encouraging.","PeriodicalId":241320,"journal":{"name":"International Convention on Rehabilitation Engineering & Assistive Technology","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Convention on Rehabilitation Engineering & Assistive Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1328491.1328541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

One of the common difficulties faced by the visually impaired is the inability to read and thus affecting their way of life. Existing portable reading devices (using character recognition and speech synthesis) have many limitations and poor in accuracy due to restrictive processing power. In this paper, we introduce our robust online multimedia content processing framework to alleviate the limitations of such portable devices. We leverage high transfer speed using existing wireless networks to send multimedia information captured from mobile devices to high-end processing servers and subsequently stream the desired output back to users. The resultant framework enables more complex processes as they are carried out on the servers and thus outperforms standard portable devices in terms of accuracy and functionalities. In addition, we describe a new approach to improve optical character recognition (OCR) results by using consecutive video frames for automatic character correction. Experiments using consecutive frames show an improvement in 25% accuracy over traditional OCR using a single image. The application is also trialed by several visually impaired personnel and the feedback obtained is encouraging.
实时在线多媒体内容处理:视障人士移动视频光学字符识别和语音合成器
视力受损者面临的共同困难之一是无法阅读,从而影响了他们的生活方式。现有的便携式阅读设备(使用字符识别和语音合成)由于处理能力的限制,存在许多局限性和准确性差。在本文中,我们介绍了我们的鲁棒在线多媒体内容处理框架,以减轻这种便携式设备的局限性。我们利用现有无线网络的高传输速度,将从移动设备捕获的多媒体信息发送到高端处理服务器,然后将所需的输出流返回给用户。由此产生的框架支持在服务器上执行更复杂的过程,因此在准确性和功能方面优于标准便携式设备。此外,我们还描述了一种利用连续视频帧进行自动字符校正来改善光学字符识别(OCR)结果的新方法。使用连续帧的实验表明,与使用单个图像的传统OCR相比,使用连续帧的OCR的精度提高了25%。该应用程序还由几名视障人士试用,获得的反馈令人鼓舞。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信