{"title":"可调可见光和红外图像融合","authors":"Boxiong Wu;Jiangtao Nie;Wei Wei;Lei Zhang;Yanning Zhang","doi":"10.1109/TCSVT.2024.3449638","DOIUrl":null,"url":null,"abstract":"The visible and infrared image fusion (VIF) method aims to utilize the complementary information between these two modalities to synthesize a new image containing richer information. Although it has been extensively studied, the synthesized image that has the best visual results is difficult to reach consensus since users have different opinions. To address this problem, we propose an adjustable VIF framework termed AdjFusion, which introduces a global controlling coefficient into VIF to enforce it can interact with users. Within AdjFusion, a semantic-aware modulation module is proposed to transform the global controlling coefficient into a semantic-aware controlling coefficient, which provides pixel-wise guidance for AdjFusion considering both interactivity and semantic information within visible and infrared images. In addition, the introduced global controlling coefficient not only can be utilized as an external interface for interaction with users but also can be easily customized by the downstream tasks (e.g., VIF-based detection and segmentation), which can help to select the best fusion result for the downstream tasks. Taking advantage of this, we further propose a lightweight adaptation module for AdjFusion to learn the global controlling coefficient to be suitable for the downstream tasks better. Experimental results demonstrate the proposed AdjFusion can 1) provide ways to dynamically synthesize images to meet the diverse demands of users; and 2) outperform the previous state-of-the-art methods on both VIF-based detection and segmentation tasks, with the constructed lightweight adaptation method. Our code will be released after accepted at \n<uri>https://github.com/BearTo2/AdjFusion</uri>\n.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13463-13477"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adjustable Visible and Infrared Image Fusion\",\"authors\":\"Boxiong Wu;Jiangtao Nie;Wei Wei;Lei Zhang;Yanning Zhang\",\"doi\":\"10.1109/TCSVT.2024.3449638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The visible and infrared image fusion (VIF) method aims to utilize the complementary information between these two modalities to synthesize a new image containing richer information. Although it has been extensively studied, the synthesized image that has the best visual results is difficult to reach consensus since users have different opinions. To address this problem, we propose an adjustable VIF framework termed AdjFusion, which introduces a global controlling coefficient into VIF to enforce it can interact with users. Within AdjFusion, a semantic-aware modulation module is proposed to transform the global controlling coefficient into a semantic-aware controlling coefficient, which provides pixel-wise guidance for AdjFusion considering both interactivity and semantic information within visible and infrared images. In addition, the introduced global controlling coefficient not only can be utilized as an external interface for interaction with users but also can be easily customized by the downstream tasks (e.g., VIF-based detection and segmentation), which can help to select the best fusion result for the downstream tasks. Taking advantage of this, we further propose a lightweight adaptation module for AdjFusion to learn the global controlling coefficient to be suitable for the downstream tasks better. Experimental results demonstrate the proposed AdjFusion can 1) provide ways to dynamically synthesize images to meet the diverse demands of users; and 2) outperform the previous state-of-the-art methods on both VIF-based detection and segmentation tasks, with the constructed lightweight adaptation method. Our code will be released after accepted at \\n<uri>https://github.com/BearTo2/AdjFusion</uri>\\n.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"34 12\",\"pages\":\"13463-13477\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10646495/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10646495/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
The visible and infrared image fusion (VIF) method aims to utilize the complementary information between these two modalities to synthesize a new image containing richer information. Although it has been extensively studied, the synthesized image that has the best visual results is difficult to reach consensus since users have different opinions. To address this problem, we propose an adjustable VIF framework termed AdjFusion, which introduces a global controlling coefficient into VIF to enforce it can interact with users. Within AdjFusion, a semantic-aware modulation module is proposed to transform the global controlling coefficient into a semantic-aware controlling coefficient, which provides pixel-wise guidance for AdjFusion considering both interactivity and semantic information within visible and infrared images. In addition, the introduced global controlling coefficient not only can be utilized as an external interface for interaction with users but also can be easily customized by the downstream tasks (e.g., VIF-based detection and segmentation), which can help to select the best fusion result for the downstream tasks. Taking advantage of this, we further propose a lightweight adaptation module for AdjFusion to learn the global controlling coefficient to be suitable for the downstream tasks better. Experimental results demonstrate the proposed AdjFusion can 1) provide ways to dynamically synthesize images to meet the diverse demands of users; and 2) outperform the previous state-of-the-art methods on both VIF-based detection and segmentation tasks, with the constructed lightweight adaptation method. Our code will be released after accepted at
https://github.com/BearTo2/AdjFusion
.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.