通过语音特征提取实现英语语音的实时翻译

IF 0.8 Q4 ROBOTICS
Xiaoyan Lei
{"title":"通过语音特征提取实现英语语音的实时翻译","authors":"Xiaoyan Lei","doi":"10.1007/s10015-024-00951-w","DOIUrl":null,"url":null,"abstract":"<div><p>Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 × 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.</p></div>","PeriodicalId":46050,"journal":{"name":"Artificial Life and Robotics","volume":"29 3","pages":"410 - 415"},"PeriodicalIF":0.8000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-time translation of English speech through speech feature extraction\",\"authors\":\"Xiaoyan Lei\",\"doi\":\"10.1007/s10015-024-00951-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 × 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.</p></div>\",\"PeriodicalId\":46050,\"journal\":{\"name\":\"Artificial Life and Robotics\",\"volume\":\"29 3\",\"pages\":\"410 - 415\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Life and Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10015-024-00951-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Life and Robotics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s10015-024-00951-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

摘要

实时英语语音翻译在商务和旅行等多种场合都非常有用。本研究的目标是提高实时英语语音翻译的效率。最初,从英语语音中提取滤波器库(FBank)特征。随后,引入了增强型变换器模型,在编码器前端加入了因果卷积模块,以捕捉带有位置信息的英语语音特征。我们使用 MuST-C 数据集测试了优化模型将英语语音翻译成不同目标语言的性能。结果显示,使用改进后的转换器,不同目标语言的翻译结果存在差异。西班牙语文本的双语评估劣等(BLEU)得分最高,为 20.84 分,而俄语文本的得分最低,为 10.56 分。平均 BLEU 得分为 18.51,平均延迟时间为 1202.33 毫秒。与传统的 Transformer 模型相比,改进后的模型在使用 3 × 3 的卷积核大小时,显示出更高的 BLEU 分数、更低的时延和最佳性能。这些结果证明了改进的 Transformer 模型在实时英语语音翻译中的可靠性,突出了其实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Real-time translation of English speech through speech feature extraction

Real-time translation of English speech through speech feature extraction

Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 × 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.00
自引率
22.20%
发文量
101
期刊介绍: Artificial Life and Robotics is an international journal publishing original technical papers and authoritative state-of-the-art reviews on the development of new technologies concerning artificial life and robotics, especially computer-based simulation and hardware for the twenty-first century. This journal covers a broad multidisciplinary field, including areas such as artificial brain research, artificial intelligence, artificial life, artificial living, artificial mind research, brain science, chaos, cognitive science, complexity, computer graphics, evolutionary computations, fuzzy control, genetic algorithms, innovative computations, intelligent control and modelling, micromachines, micro-robot world cup soccer tournament, mobile vehicles, neural networks, neurocomputers, neurocomputing technologies and applications, robotics, robus virtual engineering, and virtual reality. Hardware-oriented submissions are particularly welcome. Publishing body: International Symposium on Artificial Life and RoboticsEditor-in-Chiei: Hiroshi Tanaka Hatanaka R Apartment 101, Hatanaka 8-7A, Ooaza-Hatanaka, Oita city, Oita, Japan 870-0856 ©International Symposium on Artificial Life and Robotics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信