{"title":"RCVS:视频流统一注册与融合框架","authors":"Housheng Xie;Meng Sang;Yukuan Zhang;Yang Yang;Shan Zhao;Jianbo Zhong","doi":"10.1109/TMM.2024.3443673","DOIUrl":null,"url":null,"abstract":"The infrared and visible cross-modal registration and fusion can generate more comprehensive representations of object and scene information. Previous frameworks primarily focus on addressing the modality disparities and the impact of preserving diverse modality information on the performance of registration and fusion tasks among different static image pairs. However, these frameworks overlook the practical deployment on real-world devices, particularly in the context of video streams. Consequently, the resulting video streams often suffer from instability in registration and fusion, characterized by fusion artifacts and inter-frame jitter. In light of these considerations, this paper proposes a unified registration and fusion scheme for video streams, termed RCVS. It utilizes a robust matcher and spatial-temporal calibration module to achieve stable registration of video sequences. Subsequently, RCVS combines a fast lightweight fusion network to provide stable fusion video streams for infrared and visible imaging. Additionally, we collect a infrared and visible video dataset HDO, which comprises high-quality infrared and visible video data captured across diverse scenes. Our RCVS exhibits superior performance in video stream registration and fusion tasks, adapting well to real-world demands. Overall, our proposed framework and HDO dataset offer the first effective and comprehensive benchmark in this field, solving stability and real-time challenges in infrared and visible video stream fusion while assessing different solution performances to foster development in this area.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11031-11043"},"PeriodicalIF":8.4000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RCVS: A Unified Registration and Fusion Framework for Video Streams\",\"authors\":\"Housheng Xie;Meng Sang;Yukuan Zhang;Yang Yang;Shan Zhao;Jianbo Zhong\",\"doi\":\"10.1109/TMM.2024.3443673\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The infrared and visible cross-modal registration and fusion can generate more comprehensive representations of object and scene information. Previous frameworks primarily focus on addressing the modality disparities and the impact of preserving diverse modality information on the performance of registration and fusion tasks among different static image pairs. However, these frameworks overlook the practical deployment on real-world devices, particularly in the context of video streams. Consequently, the resulting video streams often suffer from instability in registration and fusion, characterized by fusion artifacts and inter-frame jitter. In light of these considerations, this paper proposes a unified registration and fusion scheme for video streams, termed RCVS. It utilizes a robust matcher and spatial-temporal calibration module to achieve stable registration of video sequences. Subsequently, RCVS combines a fast lightweight fusion network to provide stable fusion video streams for infrared and visible imaging. Additionally, we collect a infrared and visible video dataset HDO, which comprises high-quality infrared and visible video data captured across diverse scenes. Our RCVS exhibits superior performance in video stream registration and fusion tasks, adapting well to real-world demands. Overall, our proposed framework and HDO dataset offer the first effective and comprehensive benchmark in this field, solving stability and real-time challenges in infrared and visible video stream fusion while assessing different solution performances to foster development in this area.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"26 \",\"pages\":\"11031-11043\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10636834/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10636834/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
RCVS: A Unified Registration and Fusion Framework for Video Streams
The infrared and visible cross-modal registration and fusion can generate more comprehensive representations of object and scene information. Previous frameworks primarily focus on addressing the modality disparities and the impact of preserving diverse modality information on the performance of registration and fusion tasks among different static image pairs. However, these frameworks overlook the practical deployment on real-world devices, particularly in the context of video streams. Consequently, the resulting video streams often suffer from instability in registration and fusion, characterized by fusion artifacts and inter-frame jitter. In light of these considerations, this paper proposes a unified registration and fusion scheme for video streams, termed RCVS. It utilizes a robust matcher and spatial-temporal calibration module to achieve stable registration of video sequences. Subsequently, RCVS combines a fast lightweight fusion network to provide stable fusion video streams for infrared and visible imaging. Additionally, we collect a infrared and visible video dataset HDO, which comprises high-quality infrared and visible video data captured across diverse scenes. Our RCVS exhibits superior performance in video stream registration and fusion tasks, adapting well to real-world demands. Overall, our proposed framework and HDO dataset offer the first effective and comprehensive benchmark in this field, solving stability and real-time challenges in infrared and visible video stream fusion while assessing different solution performances to foster development in this area.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.