{"title":"内容交付网络驱动的音频识别增强英语交互场景","authors":"Liyuan Teng","doi":"10.1002/itl2.70105","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>The growing demand for real-time multilingual speech interaction systems poses significant challenges in terms of latency, scalability, and contextual accuracy. Conventional cloud-based solutions suffer from high delays, while edge devices lack computational resources for complex models. Static CDN configurations further exacerbate regional resource underutilization, and existing systems achieve relatively lower intent accuracy in multilingual scenarios. To address these limitations, we propose a three-tier framework integrating predictive CDN and a lightweight tiny machine learning-based audio recognition on which the multi-attention is introduced for context-aware English interaction scenarios. In particular, the LSTM model is leveraged to implement the on-device fine-tuning and context-aware so as to achieve well interactions. The experimental results demonstrate that TinyLSTM achieves superior performance with an error rate of 14.2% and an intent accuracy of 89.7%, while maintaining the lowest latency at 23 ms, making it highly effective for real-time applications on edge devices. Additionally, incorporating quantization, knowledge graphs, and emotion feedback progressively improves model effectiveness and increases engagement scores to 4.9, highlighting the importance of these components in enhancing both technical accuracy and user interaction quality.</p>\n </div>","PeriodicalId":100725,"journal":{"name":"Internet Technology Letters","volume":"8 5","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Content Delivery Network Driven Audio Recognition for Enhancing English Interaction Scenarios\",\"authors\":\"Liyuan Teng\",\"doi\":\"10.1002/itl2.70105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>The growing demand for real-time multilingual speech interaction systems poses significant challenges in terms of latency, scalability, and contextual accuracy. Conventional cloud-based solutions suffer from high delays, while edge devices lack computational resources for complex models. Static CDN configurations further exacerbate regional resource underutilization, and existing systems achieve relatively lower intent accuracy in multilingual scenarios. To address these limitations, we propose a three-tier framework integrating predictive CDN and a lightweight tiny machine learning-based audio recognition on which the multi-attention is introduced for context-aware English interaction scenarios. In particular, the LSTM model is leveraged to implement the on-device fine-tuning and context-aware so as to achieve well interactions. The experimental results demonstrate that TinyLSTM achieves superior performance with an error rate of 14.2% and an intent accuracy of 89.7%, while maintaining the lowest latency at 23 ms, making it highly effective for real-time applications on edge devices. Additionally, incorporating quantization, knowledge graphs, and emotion feedback progressively improves model effectiveness and increases engagement scores to 4.9, highlighting the importance of these components in enhancing both technical accuracy and user interaction quality.</p>\\n </div>\",\"PeriodicalId\":100725,\"journal\":{\"name\":\"Internet Technology Letters\",\"volume\":\"8 5\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2025-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Internet Technology Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/itl2.70105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/itl2.70105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
Content Delivery Network Driven Audio Recognition for Enhancing English Interaction Scenarios
The growing demand for real-time multilingual speech interaction systems poses significant challenges in terms of latency, scalability, and contextual accuracy. Conventional cloud-based solutions suffer from high delays, while edge devices lack computational resources for complex models. Static CDN configurations further exacerbate regional resource underutilization, and existing systems achieve relatively lower intent accuracy in multilingual scenarios. To address these limitations, we propose a three-tier framework integrating predictive CDN and a lightweight tiny machine learning-based audio recognition on which the multi-attention is introduced for context-aware English interaction scenarios. In particular, the LSTM model is leveraged to implement the on-device fine-tuning and context-aware so as to achieve well interactions. The experimental results demonstrate that TinyLSTM achieves superior performance with an error rate of 14.2% and an intent accuracy of 89.7%, while maintaining the lowest latency at 23 ms, making it highly effective for real-time applications on edge devices. Additionally, incorporating quantization, knowledge graphs, and emotion feedback progressively improves model effectiveness and increases engagement scores to 4.9, highlighting the importance of these components in enhancing both technical accuracy and user interaction quality.