Intelligent English Teaching Video Traffic Classification in Wireless Communication Networks via Large Model-Enhanced Sparse Attention Vision Transformer

IF 0.5 Q4 TELECOMMUNICATIONS

Internet Technology Letters Pub Date : 2025-10-04 DOI:10.1002/itl2.70153

Jinjin Liu

{"title":"Intelligent English Teaching Video Traffic Classification in Wireless Communication Networks via Large Model-Enhanced Sparse Attention Vision Transformer","authors":"Jinjin Liu","doi":"10.1002/itl2.70153","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This letter presents a novel framework combining large language models with a sparse attention vision transformer (SA-ViT) to classify English teaching video traffic in wireless networks. Our approach analyzes both visual content frames and extracted English speech transcripts to identify educational content types, difficulty levels, and priority requirements. The proposed model transforms video frames into visual patches while simultaneously processing English linguistic content through pre-trained language models, enabling an understanding of educational multimedia traffic. Through extensive evaluation of real-world English teaching video datasets transmitted over wireless networks, our SA-ViT framework achieves 97.5% classification accuracy, representing an 11.3% improvement over conventional video traffic classification methods. The results demonstrate effective integration of visual understanding, English language comprehension, and wireless network optimization for enhanced educational content delivery.</p>\n </div>","PeriodicalId":100725,"journal":{"name":"Internet Technology Letters","volume":"8 6","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet Technology Letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/itl2.70153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

This letter presents a novel framework combining large language models with a sparse attention vision transformer (SA-ViT) to classify English teaching video traffic in wireless networks. Our approach analyzes both visual content frames and extracted English speech transcripts to identify educational content types, difficulty levels, and priority requirements. The proposed model transforms video frames into visual patches while simultaneously processing English linguistic content through pre-trained language models, enabling an understanding of educational multimedia traffic. Through extensive evaluation of real-world English teaching video datasets transmitted over wireless networks, our SA-ViT framework achieves 97.5% classification accuracy, representing an 11.3% improvement over conventional video traffic classification methods. The results demonstrate effective integration of visual understanding, English language comprehension, and wireless network optimization for enhanced educational content delivery.

Abstract Image

查看原文本刊更多论文

基于大模型增强稀疏注意视觉变压器的无线通信网络智能英语教学视频流量分类

本文提出了一种结合大型语言模型和稀疏注意力视觉转换器（SA-ViT）的新框架，用于对无线网络中的英语教学视频流量进行分类。我们的方法分析视觉内容框架和提取的英语演讲文本，以确定教育内容类型、难度等级和优先级要求。该模型将视频帧转换为视觉片段，同时通过预训练的语言模型处理英语语言内容，从而能够理解教育多媒体流量。通过对无线网络传输的真实英语教学视频数据集的广泛评估，我们的SA-ViT框架实现了97.5%的分类准确率，比传统的视频流量分类方法提高了11.3%。结果表明，视觉理解、英语语言理解和无线网络优化有效地整合在一起，以增强教育内容的传递。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Internet Technology Letters

CiteScore

3.10

自引率

0.00%

发文量