{"title":"CaPaT: Cross-Aware Paired-Affine Transformation for Multimodal Data Fusion Network","authors":"Jinping Wang;Hao Chen;Xiaofei Zhang;Weiwei Song","doi":"10.1109/LGRS.2025.3560931","DOIUrl":null,"url":null,"abstract":"This letter proposes a cross-aware paired-affine transformation (CaPaT) network for multimodal data fusion tasks. Unlike existing networks that employ weight-sharing or indirect interaction strategies, the CaPaT introduces a direct feature interaction paradigm that significantly improves the transfer efficiency of feature fusion while reducing the number of model parameters. Specifically, this letter, respectively, splits multimodal data along the channel domain. It synthesizes specific group channels and opposite residual channels as data pairs to generate refined features, achieving direct interaction among multimodal features. Next, a scaling attention module is conducted on the refined feature pair for confidence map generation. Then, this letter multiplies confidence maps by their corresponding feature pairs, determining a more reasonable and significant margin feature representation. Finally, a classifier is conducted on the transformation features to output the final class labels. Experimental results demonstrate that the CaPaT achieves superior classification performance with fewer parameters than state-of-the-art methods.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10973085/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This letter proposes a cross-aware paired-affine transformation (CaPaT) network for multimodal data fusion tasks. Unlike existing networks that employ weight-sharing or indirect interaction strategies, the CaPaT introduces a direct feature interaction paradigm that significantly improves the transfer efficiency of feature fusion while reducing the number of model parameters. Specifically, this letter, respectively, splits multimodal data along the channel domain. It synthesizes specific group channels and opposite residual channels as data pairs to generate refined features, achieving direct interaction among multimodal features. Next, a scaling attention module is conducted on the refined feature pair for confidence map generation. Then, this letter multiplies confidence maps by their corresponding feature pairs, determining a more reasonable and significant margin feature representation. Finally, a classifier is conducted on the transformation features to output the final class labels. Experimental results demonstrate that the CaPaT achieves superior classification performance with fewer parameters than state-of-the-art methods.