Urban informal settlements interpretation via a novel multi-modal Kolmogorov–Arnold fusion network by exploring hierarchical features from remote sensing and street view images
{"title":"Urban informal settlements interpretation via a novel multi-modal Kolmogorov–Arnold fusion network by exploring hierarchical features from remote sensing and street view images","authors":"Hongyang Niu, Runyu Fan, Jiajun Chen, Zijian Xu, Ruyi Feng","doi":"10.1016/j.srs.2025.100208","DOIUrl":null,"url":null,"abstract":"<div><div>Urban informal settlements (UIS) interpretation has important scientific value for achieving urban sustainable development. Recent research on UIS interpretation tasks mainly includes the single-modality method, which uses remote sensing images, and the multi-modality method which uses remote sensing and geospatial data. However, from a single remote sensing perspective, the inter-class similarities, and a regional mixture of complex geo-objects from a bird-eye perspective of UIS areas make UIS interpretation extremely challenging. The current multi-modal methods cannot fully explore the modality-specific features within the modality or ignore the modality-correlation features between different modalities. To address these issues, this study proposed a novel multi-modal Kolmogorov–Arnold fusion network, namely KANFusion, to explore the modality-specific features within the modality and fuse the modality-correlation features between different modalities to boost UIS interpretation using remote sensing and street view images. The proposed KANFusion model employs the Kolmogorov–Arnold Network (KAN) instead of the conventional MLP structure to enhance the model-fitting capability of heterogeneous modality-specific features and uses a novel Multi-level Feature Fusion Module with KAN block (MFF) to fuse the hierarchical modality-specific and modality-fusion features from remote sensing and street view images for better UIS interpretation performance. We conducted extensive experiments on the manually annotated ChinaUIS dataset of eight megacities in China and a public <span><math><mrow><msup><mrow><mi>S</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>U</mi><mi>V</mi></mrow></math></span> dataset and compared the proposed KANFusion with other state-of-the-art methods. The experimental results confirmed the superiority of the proposed KANFusion. This work is available in <span><span>https://github.com/cyg-nhyang/KANFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":101147,"journal":{"name":"Science of Remote Sensing","volume":"11 ","pages":"Article 100208"},"PeriodicalIF":5.7000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of Remote Sensing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666017225000148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Urban informal settlements (UIS) interpretation has important scientific value for achieving urban sustainable development. Recent research on UIS interpretation tasks mainly includes the single-modality method, which uses remote sensing images, and the multi-modality method which uses remote sensing and geospatial data. However, from a single remote sensing perspective, the inter-class similarities, and a regional mixture of complex geo-objects from a bird-eye perspective of UIS areas make UIS interpretation extremely challenging. The current multi-modal methods cannot fully explore the modality-specific features within the modality or ignore the modality-correlation features between different modalities. To address these issues, this study proposed a novel multi-modal Kolmogorov–Arnold fusion network, namely KANFusion, to explore the modality-specific features within the modality and fuse the modality-correlation features between different modalities to boost UIS interpretation using remote sensing and street view images. The proposed KANFusion model employs the Kolmogorov–Arnold Network (KAN) instead of the conventional MLP structure to enhance the model-fitting capability of heterogeneous modality-specific features and uses a novel Multi-level Feature Fusion Module with KAN block (MFF) to fuse the hierarchical modality-specific and modality-fusion features from remote sensing and street view images for better UIS interpretation performance. We conducted extensive experiments on the manually annotated ChinaUIS dataset of eight megacities in China and a public dataset and compared the proposed KANFusion with other state-of-the-art methods. The experimental results confirmed the superiority of the proposed KANFusion. This work is available in https://github.com/cyg-nhyang/KANFusion.