AdaStreamLite

IF 4.5 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Pub Date : 2024-01-12 DOI:10.1145/3631460

Yuheng Wei, Jie Xiong, Hui Liu, Yingtao Yu, Jiangtao Pan, Junzhao Du

{"title":"AdaStreamLite","authors":"Yuheng Wei, Jie Xiong, Hui Liu, Yingtao Yu, Jiangtao Pan, Junzhao Du","doi":"10.1145/3631460","DOIUrl":null,"url":null,"abstract":"Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to develop a high-performance streaming speech recognition system purely running on mobile platforms, due to the complex real-world acoustic environments and the limited computational resources of smartphones. Most existing solutions lack the generalization to unseen environments and have difficulty to work with streaming speech. In this paper, we design AdaStreamLite, an environment-adaptive streaming speech recognition tool for smartphones. AdaStreamLite interacts with its surroundings to capture the characteristics of the current acoustic environment to improve the robustness against ambient noise in a lightweight manner. We design an environment representation extractor to model acoustic environments with compact feature vectors, and construct a representation lookup table to improve the generalization of AdaStreamLite to unseen environments. We train our system using large speech datasets publicly available covering different languages. We conduct experiments in a large range of real acoustic environments with different smartphones. The results show that AdaStreamLite outperforms the state-of-the-art methods in terms of recognition accuracy, computational resource consumption and robustness against unseen environments.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":"11 22","pages":"1 - 29"},"PeriodicalIF":4.5000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Streaming speech recognition aims to transcribe speech to text in a streaming manner, providing real-time speech interaction for smartphone users. However, it is not trivial to develop a high-performance streaming speech recognition system purely running on mobile platforms, due to the complex real-world acoustic environments and the limited computational resources of smartphones. Most existing solutions lack the generalization to unseen environments and have difficulty to work with streaming speech. In this paper, we design AdaStreamLite, an environment-adaptive streaming speech recognition tool for smartphones. AdaStreamLite interacts with its surroundings to capture the characteristics of the current acoustic environment to improve the robustness against ambient noise in a lightweight manner. We design an environment representation extractor to model acoustic environments with compact feature vectors, and construct a representation lookup table to improve the generalization of AdaStreamLite to unseen environments. We train our system using large speech datasets publicly available covering different languages. We conduct experiments in a large range of real acoustic environments with different smartphones. The results show that AdaStreamLite outperforms the state-of-the-art methods in terms of recognition accuracy, computational resource consumption and robustness against unseen environments.

查看原文本刊更多论文

AdaStreamLite

流式语音识别旨在以流式方式将语音转录为文本，为智能手机用户提供实时语音交互。然而，由于现实世界的声学环境复杂，智能手机的计算资源有限，要开发一个纯粹在移动平台上运行的高性能流式语音识别系统并非易事。大多数现有解决方案缺乏对未知环境的泛化能力，并且难以处理流式语音。在本文中，我们设计了 AdaStreamLite，一种用于智能手机的环境适应型流式语音识别工具。AdaStreamLite 可与周围环境互动，捕捉当前声学环境的特征，从而以轻量级方式提高对环境噪声的鲁棒性。我们设计了一个环境表征提取器，用紧凑的特征向量对声学环境进行建模，并构建了一个表征查找表，以提高 AdaStreamLite 对未知环境的泛化能力。我们使用公开的涵盖不同语言的大型语音数据集来训练我们的系统。我们使用不同的智能手机在大量真实的声学环境中进行了实验。结果表明，AdaStreamLite 在识别准确率、计算资源消耗和对未知环境的鲁棒性方面都优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Computer Science-Computer Networks and Communications

CiteScore

9.10

自引率

0.00%

发文量

154