Intelligent English Teaching Video Traffic Classification in Wireless Communication Networks via Large Model-Enhanced Sparse Attention Vision Transformer
{"title":"Intelligent English Teaching Video Traffic Classification in Wireless Communication Networks via Large Model-Enhanced Sparse Attention Vision Transformer","authors":"Jinjin Liu","doi":"10.1002/itl2.70153","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This letter presents a novel framework combining large language models with a sparse attention vision transformer (SA-ViT) to classify English teaching video traffic in wireless networks. Our approach analyzes both visual content frames and extracted English speech transcripts to identify educational content types, difficulty levels, and priority requirements. The proposed model transforms video frames into visual patches while simultaneously processing English linguistic content through pre-trained language models, enabling an understanding of educational multimedia traffic. Through extensive evaluation of real-world English teaching video datasets transmitted over wireless networks, our SA-ViT framework achieves 97.5% classification accuracy, representing an 11.3% improvement over conventional video traffic classification methods. The results demonstrate effective integration of visual understanding, English language comprehension, and wireless network optimization for enhanced educational content delivery.</p>\n </div>","PeriodicalId":100725,"journal":{"name":"Internet Technology Letters","volume":"8 6","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/itl2.70153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This letter presents a novel framework combining large language models with a sparse attention vision transformer (SA-ViT) to classify English teaching video traffic in wireless networks. Our approach analyzes both visual content frames and extracted English speech transcripts to identify educational content types, difficulty levels, and priority requirements. The proposed model transforms video frames into visual patches while simultaneously processing English linguistic content through pre-trained language models, enabling an understanding of educational multimedia traffic. Through extensive evaluation of real-world English teaching video datasets transmitted over wireless networks, our SA-ViT framework achieves 97.5% classification accuracy, representing an 11.3% improvement over conventional video traffic classification methods. The results demonstrate effective integration of visual understanding, English language comprehension, and wireless network optimization for enhanced educational content delivery.