使用个性化联邦学习和融合技术的360°视频基于内容的视口预测框架

2023 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2023-07-01 DOI:10.1109/ICME55011.2023.00118

Mehdi Setayesh, V. Wong

{"title":"使用个性化联邦学习和融合技术的360°视频基于内容的视口预测框架","authors":"Mehdi Setayesh, V. Wong","doi":"10.1109/ICME55011.2023.00118","DOIUrl":null,"url":null,"abstract":"Viewport prediction is a key enabler for 360° video streaming over wireless networks. To improve the prediction accuracy, a common approach is to use a content-based viewport prediction model. Saliency detection based on traditional convolutional neural networks (CNNs) suffers from distortion due to equirectangular projection. Also, the viewers may have their own viewing behavior and are not willing to share their historical head movement with others. To address the aforementioned issues, in this paper, we first develop a saliency detection model using a spherical CNN (SPCNN). Then, we train the viewers’ head movement prediction model using personalized federated learning (PFL). Finally, we propose a content-based viewport prediction framework by integrating the video saliency map and the head orientation map of each viewer using fusion techniques. The experimental results show that our proposed framework provides higher average accuracy and precision when compared with three state-of-the-art algorithms from the literature.","PeriodicalId":321830,"journal":{"name":"2023 IEEE International Conference on Multimedia and Expo (ICME)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Content-based Viewport Prediction Framework for 360° Video Using Personalized Federated Learning and Fusion Techniques\",\"authors\":\"Mehdi Setayesh, V. Wong\",\"doi\":\"10.1109/ICME55011.2023.00118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Viewport prediction is a key enabler for 360° video streaming over wireless networks. To improve the prediction accuracy, a common approach is to use a content-based viewport prediction model. Saliency detection based on traditional convolutional neural networks (CNNs) suffers from distortion due to equirectangular projection. Also, the viewers may have their own viewing behavior and are not willing to share their historical head movement with others. To address the aforementioned issues, in this paper, we first develop a saliency detection model using a spherical CNN (SPCNN). Then, we train the viewers’ head movement prediction model using personalized federated learning (PFL). Finally, we propose a content-based viewport prediction framework by integrating the video saliency map and the head orientation map of each viewer using fusion techniques. The experimental results show that our proposed framework provides higher average accuracy and precision when compared with three state-of-the-art algorithms from the literature.\",\"PeriodicalId\":321830,\"journal\":{\"name\":\"2023 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME55011.2023.00118\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME55011.2023.00118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

视口预测是通过无线网络实现360°视频流的关键。为了提高预测精度，常用的方法是使用基于内容的视口预测模型。基于传统卷积神经网络(cnn)的显著性检测存在等矩形投影导致的失真问题。此外，观众可能有自己的观看行为，不愿意与他人分享他们的历史头部运动。为了解决上述问题，在本文中，我们首先使用球面CNN (SPCNN)开发了一个显著性检测模型。然后，我们使用个性化联邦学习(PFL)训练观众的头部运动预测模型。最后，我们提出了一个基于内容的视口预测框架，该框架使用融合技术将视频显著性图和每个观看者的头部方向图整合在一起。实验结果表明，与文献中最先进的三种算法相比，我们提出的框架具有更高的平均准确度和精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Content-based Viewport Prediction Framework for 360° Video Using Personalized Federated Learning and Fusion Techniques

Viewport prediction is a key enabler for 360° video streaming over wireless networks. To improve the prediction accuracy, a common approach is to use a content-based viewport prediction model. Saliency detection based on traditional convolutional neural networks (CNNs) suffers from distortion due to equirectangular projection. Also, the viewers may have their own viewing behavior and are not willing to share their historical head movement with others. To address the aforementioned issues, in this paper, we first develop a saliency detection model using a spherical CNN (SPCNN). Then, we train the viewers’ head movement prediction model using personalized federated learning (PFL). Finally, we propose a content-based viewport prediction framework by integrating the video saliency map and the head orientation map of each viewer using fusion techniques. The experimental results show that our proposed framework provides higher average accuracy and precision when compared with three state-of-the-art algorithms from the literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量