VIhanceD:用于边缘设备的高效视频超分辨率

2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON) Pub Date : 2023-02-24 DOI:10.1109/DELCON57910.2023.10127275

Ashish Papanai, Siddharth Babbar, Anshuman Pandey, Harshit Kathuria, Alok Kumar Sharma, Namita Gupta

{"title":"VIhanceD:用于边缘设备的高效视频超分辨率","authors":"Ashish Papanai, Siddharth Babbar, Anshuman Pandey, Harshit Kathuria, Alok Kumar Sharma, Namita Gupta","doi":"10.1109/DELCON57910.2023.10127275","DOIUrl":null,"url":null,"abstract":"Video Super Resolution (VSR) aims to generate high-resolution (HR) frames from corresponding low-resolution (LR) frames. It draws a stark contrast from single image super-resolution (SISR) because of its high temporal dependency on misaligned supporting frames. The existing methods involve using RNNs to learn the temporal dependency by using other networks (CNNs, GANs) for predicting neighboring pixels. Due to the memory and processing constraints and the inference time required for up-scaling LR frames, a wide variety of VSR techniques cannot be applied to mid-range and budget mobile devices. This paper presents VIhanceD, a real-time sliding window-based network that can operate on budget smartphones and laptops while producing cutting-edge results on various video datasets. Our approaches include both spatial and temporal dependencies to make the up-scaled HR frames coherent and free of motion distortions. We focus on enhancing the user experience in areas with internet restrictions due to social, political, and geographical limitations. The mobile app (and PC client) provides a continuous stream of HR frames without buffering. We obtained 33.9 PSNR on REDS Dataset with a single frame inference time of 23.6 ms demonstrating state-of-the-art performance. Our subsequent experiments on VIMEO-90K dataset demonstrates that the suggested method is generalizable and works on natural video frames and textual data, making it suitable for infotainment and multimedia.","PeriodicalId":193577,"journal":{"name":"2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VIhanceD: Efficient Video Super Resolution for Edge Devices\",\"authors\":\"Ashish Papanai, Siddharth Babbar, Anshuman Pandey, Harshit Kathuria, Alok Kumar Sharma, Namita Gupta\",\"doi\":\"10.1109/DELCON57910.2023.10127275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video Super Resolution (VSR) aims to generate high-resolution (HR) frames from corresponding low-resolution (LR) frames. It draws a stark contrast from single image super-resolution (SISR) because of its high temporal dependency on misaligned supporting frames. The existing methods involve using RNNs to learn the temporal dependency by using other networks (CNNs, GANs) for predicting neighboring pixels. Due to the memory and processing constraints and the inference time required for up-scaling LR frames, a wide variety of VSR techniques cannot be applied to mid-range and budget mobile devices. This paper presents VIhanceD, a real-time sliding window-based network that can operate on budget smartphones and laptops while producing cutting-edge results on various video datasets. Our approaches include both spatial and temporal dependencies to make the up-scaled HR frames coherent and free of motion distortions. We focus on enhancing the user experience in areas with internet restrictions due to social, political, and geographical limitations. The mobile app (and PC client) provides a continuous stream of HR frames without buffering. We obtained 33.9 PSNR on REDS Dataset with a single frame inference time of 23.6 ms demonstrating state-of-the-art performance. Our subsequent experiments on VIMEO-90K dataset demonstrates that the suggested method is generalizable and works on natural video frames and textual data, making it suitable for infotainment and multimedia.\",\"PeriodicalId\":193577,\"journal\":{\"name\":\"2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DELCON57910.2023.10127275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DELCON57910.2023.10127275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

视频超分辨率(Video Super Resolution, VSR)旨在从相应的低分辨率(LR)帧生成高分辨率(HR)帧。它与单图像超分辨率(SISR)形成鲜明对比，因为它高度依赖于错位的支撑帧。现有的方法包括使用rnn通过使用其他网络(cnn, gan)来预测相邻像素来学习时间依赖性。由于内存和处理限制以及放大LR帧所需的推理时间，各种VSR技术无法应用于中档和廉价移动设备。本文介绍了VIhanceD，这是一种基于实时滑动窗口的网络，可以在廉价的智能手机和笔记本电脑上运行，同时在各种视频数据集上产生尖端的结果。我们的方法包括空间和时间依赖，以使放大的HR帧连贯且无运动扭曲。我们专注于在由于社会、政治和地理限制而受到互联网限制的地区增强用户体验。移动应用程序(和PC客户端)提供无缓冲的连续HR帧流。我们在REDS数据集上获得了33.9的PSNR，单帧推理时间为23.6 ms，展示了最先进的性能。我们随后在VIMEO-90K数据集上的实验表明，该方法具有泛化性，适用于自然视频帧和文本数据，适用于信息娱乐和多媒体。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VIhanceD: Efficient Video Super Resolution for Edge Devices

Video Super Resolution (VSR) aims to generate high-resolution (HR) frames from corresponding low-resolution (LR) frames. It draws a stark contrast from single image super-resolution (SISR) because of its high temporal dependency on misaligned supporting frames. The existing methods involve using RNNs to learn the temporal dependency by using other networks (CNNs, GANs) for predicting neighboring pixels. Due to the memory and processing constraints and the inference time required for up-scaling LR frames, a wide variety of VSR techniques cannot be applied to mid-range and budget mobile devices. This paper presents VIhanceD, a real-time sliding window-based network that can operate on budget smartphones and laptops while producing cutting-edge results on various video datasets. Our approaches include both spatial and temporal dependencies to make the up-scaled HR frames coherent and free of motion distortions. We focus on enhancing the user experience in areas with internet restrictions due to social, political, and geographical limitations. The mobile app (and PC client) provides a continuous stream of HR frames without buffering. We obtained 33.9 PSNR on REDS Dataset with a single frame inference time of 23.6 ms demonstrating state-of-the-art performance. Our subsequent experiments on VIMEO-90K dataset demonstrates that the suggested method is generalizable and works on natural video frames and textual data, making it suitable for infotainment and multimedia.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 2nd Edition of IEEE Delhi Section Flagship Conference (DELCON)

自引率

0.00%

发文量