{"title":"基于长时间特征聚合的时空视频超分辨率","authors":"Kuanhao Chen, Zijie Yue, Miaojing Shi","doi":"10.1007/s43684-023-00051-9","DOIUrl":null,"url":null,"abstract":"<div><p>Space-time video super-resolution (STVSR) serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts. Recent approaches utilize end-to-end deep learning models to achieve STVSR. They first interpolate intermediate frame features between given frames, then perform local and global refinement among the feature sequence, and finally increase the spatial resolutions of these features. However, in the most important feature interpolation phase, they only capture spatial-temporal information from the most adjacent frame features, ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity. In this paper, we propose a novel long-term temporal feature aggregation network (LTFA-Net) for STVSR. Specifically, we design a long-term mixture of experts (LTMoE) module for feature interpolation. LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features, which are then combined with different weights to obtain interpolation results using several gating nets. Next, we perform local and global feature refinement using the Locally-temporal Feature Comparison (LFC) module and bidirectional deformable ConvLSTM layer, respectively. Experimental results on two standard benchmarks, Adobe240 and GoPro, indicate the effectiveness and superiority of our approach over state of the art.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-023-00051-9.pdf","citationCount":"0","resultStr":"{\"title\":\"Space-time video super-resolution using long-term temporal feature aggregation\",\"authors\":\"Kuanhao Chen, Zijie Yue, Miaojing Shi\",\"doi\":\"10.1007/s43684-023-00051-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Space-time video super-resolution (STVSR) serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts. Recent approaches utilize end-to-end deep learning models to achieve STVSR. They first interpolate intermediate frame features between given frames, then perform local and global refinement among the feature sequence, and finally increase the spatial resolutions of these features. However, in the most important feature interpolation phase, they only capture spatial-temporal information from the most adjacent frame features, ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity. In this paper, we propose a novel long-term temporal feature aggregation network (LTFA-Net) for STVSR. Specifically, we design a long-term mixture of experts (LTMoE) module for feature interpolation. LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features, which are then combined with different weights to obtain interpolation results using several gating nets. Next, we perform local and global feature refinement using the Locally-temporal Feature Comparison (LFC) module and bidirectional deformable ConvLSTM layer, respectively. Experimental results on two standard benchmarks, Adobe240 and GoPro, indicate the effectiveness and superiority of our approach over state of the art.</p></div>\",\"PeriodicalId\":71187,\"journal\":{\"name\":\"自主智能系统(英文)\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s43684-023-00051-9.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"自主智能系统(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s43684-023-00051-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能系统(英文)","FirstCategoryId":"1093","ListUrlMain":"https://link.springer.com/article/10.1007/s43684-023-00051-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Space-time video super-resolution using long-term temporal feature aggregation
Space-time video super-resolution (STVSR) serves the purpose to reconstruct high-resolution high-frame-rate videos from their low-resolution low-frame-rate counterparts. Recent approaches utilize end-to-end deep learning models to achieve STVSR. They first interpolate intermediate frame features between given frames, then perform local and global refinement among the feature sequence, and finally increase the spatial resolutions of these features. However, in the most important feature interpolation phase, they only capture spatial-temporal information from the most adjacent frame features, ignoring modelling long-term spatial-temporal correlations between multiple neighbouring frames to restore variable-speed object movements and maintain long-term motion continuity. In this paper, we propose a novel long-term temporal feature aggregation network (LTFA-Net) for STVSR. Specifically, we design a long-term mixture of experts (LTMoE) module for feature interpolation. LTMoE contains multiple experts to extract mutual and complementary spatial-temporal information from multiple consecutive adjacent frame features, which are then combined with different weights to obtain interpolation results using several gating nets. Next, we perform local and global feature refinement using the Locally-temporal Feature Comparison (LFC) module and bidirectional deformable ConvLSTM layer, respectively. Experimental results on two standard benchmarks, Adobe240 and GoPro, indicate the effectiveness and superiority of our approach over state of the art.