内容交付网络驱动的音频识别增强英语交互场景

IF 0.5 Q4 TELECOMMUNICATIONS

Internet Technology Letters Pub Date : 2025-08-27 DOI:10.1002/itl2.70105

Liyuan Teng

{"title":"内容交付网络驱动的音频识别增强英语交互场景","authors":"Liyuan Teng","doi":"10.1002/itl2.70105","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The growing demand for real-time multilingual speech interaction systems poses significant challenges in terms of latency, scalability, and contextual accuracy. Conventional cloud-based solutions suffer from high delays, while edge devices lack computational resources for complex models. Static CDN configurations further exacerbate regional resource underutilization, and existing systems achieve relatively lower intent accuracy in multilingual scenarios. To address these limitations, we propose a three-tier framework integrating predictive CDN and a lightweight tiny machine learning-based audio recognition on which the multi-attention is introduced for context-aware English interaction scenarios. In particular, the LSTM model is leveraged to implement the on-device fine-tuning and context-aware so as to achieve well interactions. The experimental results demonstrate that TinyLSTM achieves superior performance with an error rate of 14.2% and an intent accuracy of 89.7%, while maintaining the lowest latency at 23 ms, making it highly effective for real-time applications on edge devices. Additionally, incorporating quantization, knowledge graphs, and emotion feedback progressively improves model effectiveness and increases engagement scores to 4.9, highlighting the importance of these components in enhancing both technical accuracy and user interaction quality.</p>\n </div>","PeriodicalId":100725,"journal":{"name":"Internet Technology Letters","volume":"8 5","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Content Delivery Network Driven Audio Recognition for Enhancing English Interaction Scenarios\",\"authors\":\"Liyuan Teng\",\"doi\":\"10.1002/itl2.70105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>The growing demand for real-time multilingual speech interaction systems poses significant challenges in terms of latency, scalability, and contextual accuracy. Conventional cloud-based solutions suffer from high delays, while edge devices lack computational resources for complex models. Static CDN configurations further exacerbate regional resource underutilization, and existing systems achieve relatively lower intent accuracy in multilingual scenarios. To address these limitations, we propose a three-tier framework integrating predictive CDN and a lightweight tiny machine learning-based audio recognition on which the multi-attention is introduced for context-aware English interaction scenarios. In particular, the LSTM model is leveraged to implement the on-device fine-tuning and context-aware so as to achieve well interactions. The experimental results demonstrate that TinyLSTM achieves superior performance with an error rate of 14.2% and an intent accuracy of 89.7%, while maintaining the lowest latency at 23 ms, making it highly effective for real-time applications on edge devices. Additionally, incorporating quantization, knowledge graphs, and emotion feedback progressively improves model effectiveness and increases engagement scores to 4.9, highlighting the importance of these components in enhancing both technical accuracy and user interaction quality.</p>\\n </div>\",\"PeriodicalId\":100725,\"journal\":{\"name\":\"Internet Technology Letters\",\"volume\":\"8 5\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2025-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Internet Technology Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/itl2.70105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/itl2.70105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

对实时多语言语音交互系统日益增长的需求在延迟、可扩展性和上下文准确性方面提出了重大挑战。传统的基于云的解决方案存在高延迟，而边缘设备缺乏复杂模型的计算资源。静态CDN配置进一步加剧了区域资源利用不足，现有系统在多语言场景下的意图准确性相对较低。为了解决这些限制，我们提出了一个三层框架，集成了预测性CDN和一个轻量级的基于机器学习的音频识别，在这个框架上，针对上下文感知的英语交互场景引入了多注意力。特别是，LSTM模型被用来实现设备上的微调和上下文感知，从而实现良好的交互。实验结果表明，TinyLSTM取得了优异的性能，错误率为14.2%，意图准确率为89.7%，同时保持了最低的延迟（23 ms），使其在边缘设备上的实时应用非常有效。此外，结合量化、知识图谱和情感反馈逐步提高了模型的有效性，并将参与得分提高到4.9，突出了这些组件在提高技术准确性和用户交互质量方面的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Content Delivery Network Driven Audio Recognition for Enhancing English Interaction Scenarios

The growing demand for real-time multilingual speech interaction systems poses significant challenges in terms of latency, scalability, and contextual accuracy. Conventional cloud-based solutions suffer from high delays, while edge devices lack computational resources for complex models. Static CDN configurations further exacerbate regional resource underutilization, and existing systems achieve relatively lower intent accuracy in multilingual scenarios. To address these limitations, we propose a three-tier framework integrating predictive CDN and a lightweight tiny machine learning-based audio recognition on which the multi-attention is introduced for context-aware English interaction scenarios. In particular, the LSTM model is leveraged to implement the on-device fine-tuning and context-aware so as to achieve well interactions. The experimental results demonstrate that TinyLSTM achieves superior performance with an error rate of 14.2% and an intent accuracy of 89.7%, while maintaining the lowest latency at 23 ms, making it highly effective for real-time applications on edge devices. Additionally, incorporating quantization, knowledge graphs, and emotion feedback progressively improves model effectiveness and increases engagement scores to 4.9, highlighting the importance of these components in enhancing both technical accuracy and user interaction quality.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Internet Technology Letters

CiteScore

3.10

自引率

0.00%

发文量