RCVS: A Unified Registration and Fusion Framework for Video Streams

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-08-14 DOI:10.1109/TMM.2024.3443673

Housheng Xie;Meng Sang;Yukuan Zhang;Yang Yang;Shan Zhao;Jianbo Zhong

{"title":"RCVS: A Unified Registration and Fusion Framework for Video Streams","authors":"Housheng Xie;Meng Sang;Yukuan Zhang;Yang Yang;Shan Zhao;Jianbo Zhong","doi":"10.1109/TMM.2024.3443673","DOIUrl":null,"url":null,"abstract":"The infrared and visible cross-modal registration and fusion can generate more comprehensive representations of object and scene information. Previous frameworks primarily focus on addressing the modality disparities and the impact of preserving diverse modality information on the performance of registration and fusion tasks among different static image pairs. However, these frameworks overlook the practical deployment on real-world devices, particularly in the context of video streams. Consequently, the resulting video streams often suffer from instability in registration and fusion, characterized by fusion artifacts and inter-frame jitter. In light of these considerations, this paper proposes a unified registration and fusion scheme for video streams, termed RCVS. It utilizes a robust matcher and spatial-temporal calibration module to achieve stable registration of video sequences. Subsequently, RCVS combines a fast lightweight fusion network to provide stable fusion video streams for infrared and visible imaging. Additionally, we collect a infrared and visible video dataset HDO, which comprises high-quality infrared and visible video data captured across diverse scenes. Our RCVS exhibits superior performance in video stream registration and fusion tasks, adapting well to real-world demands. Overall, our proposed framework and HDO dataset offer the first effective and comprehensive benchmark in this field, solving stability and real-time challenges in infrared and visible video stream fusion while assessing different solution performances to foster development in this area.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11031-11043"},"PeriodicalIF":8.4000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10636834/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The infrared and visible cross-modal registration and fusion can generate more comprehensive representations of object and scene information. Previous frameworks primarily focus on addressing the modality disparities and the impact of preserving diverse modality information on the performance of registration and fusion tasks among different static image pairs. However, these frameworks overlook the practical deployment on real-world devices, particularly in the context of video streams. Consequently, the resulting video streams often suffer from instability in registration and fusion, characterized by fusion artifacts and inter-frame jitter. In light of these considerations, this paper proposes a unified registration and fusion scheme for video streams, termed RCVS. It utilizes a robust matcher and spatial-temporal calibration module to achieve stable registration of video sequences. Subsequently, RCVS combines a fast lightweight fusion network to provide stable fusion video streams for infrared and visible imaging. Additionally, we collect a infrared and visible video dataset HDO, which comprises high-quality infrared and visible video data captured across diverse scenes. Our RCVS exhibits superior performance in video stream registration and fusion tasks, adapting well to real-world demands. Overall, our proposed framework and HDO dataset offer the first effective and comprehensive benchmark in this field, solving stability and real-time challenges in infrared and visible video stream fusion while assessing different solution performances to foster development in this area.

查看原文本刊更多论文

RCVS：视频流统一注册与融合框架

红外和可见光跨模态配准与融合可以生成更全面的物体和场景信息表征。以往的框架主要侧重于解决模态差异问题，以及保留不同模态信息对不同静态图像对之间的配准和融合任务性能的影响。然而，这些框架忽视了在现实世界设备上的实际部署，尤其是在视频流的背景下。因此，生成的视频流在配准和融合过程中经常会出现不稳定的情况，表现为融合伪像和帧间抖动。有鉴于此，本文提出了一种统一的视频流注册和融合方案，称为 RCVS。它利用鲁棒匹配器和时空校准模块实现视频序列的稳定注册。随后，RCVS 与快速轻量级融合网络相结合，为红外和可见光成像提供稳定的融合视频流。此外，我们还收集了红外和可见光视频数据集 HDO，其中包括在不同场景中捕获的高质量红外和可见光视频数据。我们的 RCVS 在视频流注册和融合任务中表现出卓越的性能，能很好地适应现实世界的需求。总之，我们提出的框架和 HDO 数据集为该领域提供了首个有效而全面的基准，解决了红外和可见光视频流融合的稳定性和实时性难题，同时评估了不同解决方案的性能，促进了该领域的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.