Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired

International Convention on Rehabilitation Engineering & Assistive Technology Pub Date : 2007-04-23 DOI:10.1145/1328491.1328541

Shi-Yong Neo, Hai-Kiat Goh, Wendy Yen-Ni Ng, Jun-Da Ong, Wilson Pang

{"title":"Real-time online multimedia content processing: mobile video optical character recognition and speech synthesizer for the visual impaired","authors":"Shi-Yong Neo, Hai-Kiat Goh, Wendy Yen-Ni Ng, Jun-Da Ong, Wilson Pang","doi":"10.1145/1328491.1328541","DOIUrl":null,"url":null,"abstract":"One of the common difficulties faced by the visually impaired is the inability to read and thus affecting their way of life. Existing portable reading devices (using character recognition and speech synthesis) have many limitations and poor in accuracy due to restrictive processing power. In this paper, we introduce our robust online multimedia content processing framework to alleviate the limitations of such portable devices. We leverage high transfer speed using existing wireless networks to send multimedia information captured from mobile devices to high-end processing servers and subsequently stream the desired output back to users. The resultant framework enables more complex processes as they are carried out on the servers and thus outperforms standard portable devices in terms of accuracy and functionalities. In addition, we describe a new approach to improve optical character recognition (OCR) results by using consecutive video frames for automatic character correction. Experiments using consecutive frames show an improvement in 25% accuracy over traditional OCR using a single image. The application is also trialed by several visually impaired personnel and the feedback obtained is encouraging.","PeriodicalId":241320,"journal":{"name":"International Convention on Rehabilitation Engineering & Assistive Technology","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Convention on Rehabilitation Engineering & Assistive Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1328491.1328541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

One of the common difficulties faced by the visually impaired is the inability to read and thus affecting their way of life. Existing portable reading devices (using character recognition and speech synthesis) have many limitations and poor in accuracy due to restrictive processing power. In this paper, we introduce our robust online multimedia content processing framework to alleviate the limitations of such portable devices. We leverage high transfer speed using existing wireless networks to send multimedia information captured from mobile devices to high-end processing servers and subsequently stream the desired output back to users. The resultant framework enables more complex processes as they are carried out on the servers and thus outperforms standard portable devices in terms of accuracy and functionalities. In addition, we describe a new approach to improve optical character recognition (OCR) results by using consecutive video frames for automatic character correction. Experiments using consecutive frames show an improvement in 25% accuracy over traditional OCR using a single image. The application is also trialed by several visually impaired personnel and the feedback obtained is encouraging.

查看原文本刊更多论文

实时在线多媒体内容处理:视障人士移动视频光学字符识别和语音合成器

视力受损者面临的共同困难之一是无法阅读，从而影响了他们的生活方式。现有的便携式阅读设备(使用字符识别和语音合成)由于处理能力的限制，存在许多局限性和准确性差。在本文中，我们介绍了我们的鲁棒在线多媒体内容处理框架，以减轻这种便携式设备的局限性。我们利用现有无线网络的高传输速度，将从移动设备捕获的多媒体信息发送到高端处理服务器，然后将所需的输出流返回给用户。由此产生的框架支持在服务器上执行更复杂的过程，因此在准确性和功能方面优于标准便携式设备。此外，我们还描述了一种利用连续视频帧进行自动字符校正来改善光学字符识别(OCR)结果的新方法。使用连续帧的实验表明，与使用单个图像的传统OCR相比，使用连续帧的OCR的精度提高了25%。该应用程序还由几名视障人士试用，获得的反馈令人鼓舞。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Convention on Rehabilitation Engineering & Assistive Technology

自引率

0.00%

发文量