{"title":"基于大模型增强稀疏注意视觉变压器的无线通信网络智能英语教学视频流量分类","authors":"Jinjin Liu","doi":"10.1002/itl2.70153","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This letter presents a novel framework combining large language models with a sparse attention vision transformer (SA-ViT) to classify English teaching video traffic in wireless networks. Our approach analyzes both visual content frames and extracted English speech transcripts to identify educational content types, difficulty levels, and priority requirements. The proposed model transforms video frames into visual patches while simultaneously processing English linguistic content through pre-trained language models, enabling an understanding of educational multimedia traffic. Through extensive evaluation of real-world English teaching video datasets transmitted over wireless networks, our SA-ViT framework achieves 97.5% classification accuracy, representing an 11.3% improvement over conventional video traffic classification methods. The results demonstrate effective integration of visual understanding, English language comprehension, and wireless network optimization for enhanced educational content delivery.</p>\n </div>","PeriodicalId":100725,"journal":{"name":"Internet Technology Letters","volume":"8 6","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intelligent English Teaching Video Traffic Classification in Wireless Communication Networks via Large Model-Enhanced Sparse Attention Vision Transformer\",\"authors\":\"Jinjin Liu\",\"doi\":\"10.1002/itl2.70153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>This letter presents a novel framework combining large language models with a sparse attention vision transformer (SA-ViT) to classify English teaching video traffic in wireless networks. Our approach analyzes both visual content frames and extracted English speech transcripts to identify educational content types, difficulty levels, and priority requirements. The proposed model transforms video frames into visual patches while simultaneously processing English linguistic content through pre-trained language models, enabling an understanding of educational multimedia traffic. Through extensive evaluation of real-world English teaching video datasets transmitted over wireless networks, our SA-ViT framework achieves 97.5% classification accuracy, representing an 11.3% improvement over conventional video traffic classification methods. The results demonstrate effective integration of visual understanding, English language comprehension, and wireless network optimization for enhanced educational content delivery.</p>\\n </div>\",\"PeriodicalId\":100725,\"journal\":{\"name\":\"Internet Technology Letters\",\"volume\":\"8 6\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2025-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Internet Technology Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/itl2.70153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/itl2.70153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
Intelligent English Teaching Video Traffic Classification in Wireless Communication Networks via Large Model-Enhanced Sparse Attention Vision Transformer
This letter presents a novel framework combining large language models with a sparse attention vision transformer (SA-ViT) to classify English teaching video traffic in wireless networks. Our approach analyzes both visual content frames and extracted English speech transcripts to identify educational content types, difficulty levels, and priority requirements. The proposed model transforms video frames into visual patches while simultaneously processing English linguistic content through pre-trained language models, enabling an understanding of educational multimedia traffic. Through extensive evaluation of real-world English teaching video datasets transmitted over wireless networks, our SA-ViT framework achieves 97.5% classification accuracy, representing an 11.3% improvement over conventional video traffic classification methods. The results demonstrate effective integration of visual understanding, English language comprehension, and wireless network optimization for enhanced educational content delivery.