AIx速度:使用语音识别模型的听力理解的播放速度优化

Proceedings of the Augmented Humans International Conference 2023 Pub Date : 2023-03-12 DOI:10.1145/3582700.3582722

Kazuki Kawamura, J. Rekimoto

{"title":"AIx速度:使用语音识别模型的听力理解的播放速度优化","authors":"Kazuki Kawamura, J. Rekimoto","doi":"10.1145/3582700.3582722","DOIUrl":null,"url":null,"abstract":"Since humans can listen to audio and watch videos at faster speeds than actually observed, we often listen to or watch these pieces of content at higher playback speeds to increase the time efficiency of content comprehension. To further utilize this capability, systems that automatically adjust the playback speed according to the user’s condition and the type of content to assist in more efficient comprehension of time-series content have been developed. However, there is still room for these systems to further extend human speed-listening ability by generating speech with playback speed optimized for even finer time units and providing it to humans. In this study, we determine whether humans can hear the optimized speech and propose a system that automatically adjusts playback speed at units as small as phonemes while ensuring speech intelligibility. The system uses the speech recognizer score as a proxy for how well a human can hear a certain unit of speech and maximizes the speech playback speed to the extent that a human can hear. This method can be used to produce fast but intelligible speech. In the evaluation experiment, we compared the speech played back at a constant fast speed and the flexibly speed-up speech generated by the proposed method in a blind test and confirmed that the proposed method produced speech that was easier to listen to.","PeriodicalId":115371,"journal":{"name":"Proceedings of the Augmented Humans International Conference 2023","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models\",\"authors\":\"Kazuki Kawamura, J. Rekimoto\",\"doi\":\"10.1145/3582700.3582722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Since humans can listen to audio and watch videos at faster speeds than actually observed, we often listen to or watch these pieces of content at higher playback speeds to increase the time efficiency of content comprehension. To further utilize this capability, systems that automatically adjust the playback speed according to the user’s condition and the type of content to assist in more efficient comprehension of time-series content have been developed. However, there is still room for these systems to further extend human speed-listening ability by generating speech with playback speed optimized for even finer time units and providing it to humans. In this study, we determine whether humans can hear the optimized speech and propose a system that automatically adjusts playback speed at units as small as phonemes while ensuring speech intelligibility. The system uses the speech recognizer score as a proxy for how well a human can hear a certain unit of speech and maximizes the speech playback speed to the extent that a human can hear. This method can be used to produce fast but intelligible speech. In the evaluation experiment, we compared the speech played back at a constant fast speed and the flexibly speed-up speech generated by the proposed method in a blind test and confirmed that the proposed method produced speech that was easier to listen to.\",\"PeriodicalId\":115371,\"journal\":{\"name\":\"Proceedings of the Augmented Humans International Conference 2023\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Augmented Humans International Conference 2023\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3582700.3582722\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Augmented Humans International Conference 2023","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582700.3582722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于人类可以以比实际观察到的更快的速度收听音频和观看视频，我们经常以更高的播放速度收听或观看这些内容，以提高内容理解的时间效率。为了进一步利用这一功能，已经开发了根据用户的情况和内容类型自动调整播放速度的系统，以帮助更有效地理解时间序列内容。然而，这些系统仍然有空间进一步扩展人类的速听能力，通过生成具有更精确的时间单位优化的回放速度的语音并将其提供给人类。在这项研究中，我们确定了人类是否可以听到优化后的语音，并提出了一个系统，该系统可以在确保语音清晰度的同时，在小到音素的单位上自动调整播放速度。该系统使用语音识别器得分作为人类听到特定语音单元的程度的代理，并将语音播放速度最大化到人类可以听到的程度。这种方法可以用来产生快速而易懂的语音。在评价实验中，我们通过盲测将恒速回放的语音与本文方法产生的灵活加速语音进行了对比，证实本文方法产生的语音更容易听。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models

Since humans can listen to audio and watch videos at faster speeds than actually observed, we often listen to or watch these pieces of content at higher playback speeds to increase the time efficiency of content comprehension. To further utilize this capability, systems that automatically adjust the playback speed according to the user’s condition and the type of content to assist in more efficient comprehension of time-series content have been developed. However, there is still room for these systems to further extend human speed-listening ability by generating speech with playback speed optimized for even finer time units and providing it to humans. In this study, we determine whether humans can hear the optimized speech and propose a system that automatically adjusts playback speed at units as small as phonemes while ensuring speech intelligibility. The system uses the speech recognizer score as a proxy for how well a human can hear a certain unit of speech and maximizes the speech playback speed to the extent that a human can hear. This method can be used to produce fast but intelligible speech. In the evaluation experiment, we compared the speech played back at a constant fast speed and the flexibly speed-up speech generated by the proposed method in a blind test and confirmed that the proposed method produced speech that was easier to listen to.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Augmented Humans International Conference 2023

自引率

0.00%

发文量