Towards Neural Codec-Empowered 360$^\circ$ Video Streaming: A Saliency-Aided Synergistic Approach

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-12-30 DOI:10.1109/TMM.2024.3521770

Jianxin Shi;Miao Zhang;Linfeng Shen;Jiangchuan Liu;Lingjun Pu;Jingdong Xu

{"title":"Towards Neural Codec-Empowered 360$^\\circ$ Video Streaming: A Saliency-Aided Synergistic Approach","authors":"Jianxin Shi;Miao Zhang;Linfeng Shen;Jiangchuan Liu;Lingjun Pu;Jingdong Xu","doi":"10.1109/TMM.2024.3521770","DOIUrl":null,"url":null,"abstract":"Networked 360<inline-formula><tex-math>$^\\circ$</tex-math></inline-formula> video has become increasingly popular. Despite the immersive experience for users, its sheer data volume, even with the latest H.266 coding and viewport adaptation, remains a significant challenge to today's networks. Recent studies have shown that integrating deep learning into video coding can significantly enhance compression efficiency, providing new opportunities for high-quality video streaming. In this work, we conduct a comprehensive analysis of the potential and issues in applying neural codecs to 360<inline-formula><tex-math>$^\\circ$</tex-math></inline-formula> video streaming. We accordingly present <inline-formula><tex-math>$\\mathsf {NETA}$</tex-math></inline-formula>, a synergistic streaming scheme that merges neural compression with traditional coding techniques, seamlessly implemented within an edge intelligence framework. To address the non-trivial challenges in the short viewport prediction window and time-varying viewing directions, we propose implicit-explicit buffer-based prefetching grounded in content visual saliency and bitrate adaptation with smart model switching around viewports. A novel Lyapunov-guided deep reinforcement learning algorithm is developed to maximize user experience and ensure long-term system stability. We further discuss the concerns towards practical development and deployment and have built a working prototype that verifies <inline-formula><tex-math>$\\mathsf {NETA}$</tex-math></inline-formula>’s excellent performance. For instance, it achieves a 27% increment in viewing quality, a 90% reduction in rebuffering time, and a 64% decrease in quality variation on average, compared to state-of-the-art approaches.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1588-1600"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10817649/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Networked 360

$^\circ$

video has become increasingly popular. Despite the immersive experience for users, its sheer data volume, even with the latest H.266 coding and viewport adaptation, remains a significant challenge to today's networks. Recent studies have shown that integrating deep learning into video coding can significantly enhance compression efficiency, providing new opportunities for high-quality video streaming. In this work, we conduct a comprehensive analysis of the potential and issues in applying neural codecs to 360

$^\circ$

video streaming. We accordingly present

$\mathsf {NETA}$

, a synergistic streaming scheme that merges neural compression with traditional coding techniques, seamlessly implemented within an edge intelligence framework. To address the non-trivial challenges in the short viewport prediction window and time-varying viewing directions, we propose implicit-explicit buffer-based prefetching grounded in content visual saliency and bitrate adaptation with smart model switching around viewports. A novel Lyapunov-guided deep reinforcement learning algorithm is developed to maximize user experience and ensure long-term system stability. We further discuss the concerns towards practical development and deployment and have built a working prototype that verifies

$\mathsf {NETA}$

’s excellent performance. For instance, it achieves a 27% increment in viewing quality, a 90% reduction in rebuffering time, and a 64% decrease in quality variation on average, compared to state-of-the-art approaches.

查看原文本刊更多论文

神经解码器支持的360视频流：一种显著性辅助协同方法

网络360度视频越来越受欢迎。尽管为用户提供了身临其境的体验，但其庞大的数据量，即使采用最新的H.266编码和视口适应，仍然是当今网络的重大挑战。最近的研究表明，将深度学习集成到视频编码中可以显著提高压缩效率，为高质量视频流提供了新的机会。在这项工作中，我们全面分析了将神经编解码器应用于360$^\circ$视频流的潜力和问题。因此，我们提出了$\mathsf {NETA}$，这是一种将神经压缩与传统编码技术相结合的协同流方案，在边缘智能框架内无缝实现。为了解决短视口预测窗口和时变观看方向中的重大挑战，我们提出了基于内容视觉显著性和比特率自适应的隐式显式缓冲区预取，并在视口周围进行智能模型切换。开发了一种新的lyapunov引导深度强化学习算法，以最大限度地提高用户体验并确保系统的长期稳定性。我们进一步讨论了对实际开发和部署的关注，并构建了一个工作原型来验证$\mathsf {NETA}$的出色性能。例如，与最先进的方法相比，它的观看质量提高了27%，重新缓冲时间减少了90%，质量变化平均减少了64%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.