{"title":"多模态空气质量预测:基于共享特定模态特征解耦的多模态特征融合网络","authors":"Xiaoxia Chen , Zhen Wang , Fangyan Dong , Kaoru Hirota","doi":"10.1016/j.envsoft.2025.106553","DOIUrl":null,"url":null,"abstract":"<div><div>Severe air pollution degrades air quality and threatens human health, necessitating accurate prediction for pollution control. While spatiotemporal networks integrating sequence models and graph structures dominate current methods, prior work neglects multimodal data fusion to enhance feature representation. This study addresses the spatial limitations of single-perspective ground monitoring by synergizing remote sensing data, which provides global air quality distribution, with ground observations. We propose a Shared-Specific Modality Decoupling-based Spatiotemporal Multimodal Fusion Network for air-quality prediction, comprising: (1) feature extractors for remote sensing images and ground monitoring data, (2) a decoupling module separating shared and modality-specific features, and (3) a hierarchical attention-graph convolution fusion module. This framework achieves effective multimodal fusion by disentangling cross-modal dependencies while preserving unique characteristics. Evaluations on two real-world datasets demonstrate superior performance over baseline models, validating the efficacy of multimodal integration for spatial–temporal air quality forecasting.</div></div>","PeriodicalId":310,"journal":{"name":"Environmental Modelling & Software","volume":"192 ","pages":"Article 106553"},"PeriodicalIF":4.6000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal air-quality prediction: A multimodal feature fusion network based on shared-specific modal feature decoupling\",\"authors\":\"Xiaoxia Chen , Zhen Wang , Fangyan Dong , Kaoru Hirota\",\"doi\":\"10.1016/j.envsoft.2025.106553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Severe air pollution degrades air quality and threatens human health, necessitating accurate prediction for pollution control. While spatiotemporal networks integrating sequence models and graph structures dominate current methods, prior work neglects multimodal data fusion to enhance feature representation. This study addresses the spatial limitations of single-perspective ground monitoring by synergizing remote sensing data, which provides global air quality distribution, with ground observations. We propose a Shared-Specific Modality Decoupling-based Spatiotemporal Multimodal Fusion Network for air-quality prediction, comprising: (1) feature extractors for remote sensing images and ground monitoring data, (2) a decoupling module separating shared and modality-specific features, and (3) a hierarchical attention-graph convolution fusion module. This framework achieves effective multimodal fusion by disentangling cross-modal dependencies while preserving unique characteristics. Evaluations on two real-world datasets demonstrate superior performance over baseline models, validating the efficacy of multimodal integration for spatial–temporal air quality forecasting.</div></div>\",\"PeriodicalId\":310,\"journal\":{\"name\":\"Environmental Modelling & Software\",\"volume\":\"192 \",\"pages\":\"Article 106553\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Environmental Modelling & Software\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1364815225002373\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Modelling & Software","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1364815225002373","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Multimodal air-quality prediction: A multimodal feature fusion network based on shared-specific modal feature decoupling
Severe air pollution degrades air quality and threatens human health, necessitating accurate prediction for pollution control. While spatiotemporal networks integrating sequence models and graph structures dominate current methods, prior work neglects multimodal data fusion to enhance feature representation. This study addresses the spatial limitations of single-perspective ground monitoring by synergizing remote sensing data, which provides global air quality distribution, with ground observations. We propose a Shared-Specific Modality Decoupling-based Spatiotemporal Multimodal Fusion Network for air-quality prediction, comprising: (1) feature extractors for remote sensing images and ground monitoring data, (2) a decoupling module separating shared and modality-specific features, and (3) a hierarchical attention-graph convolution fusion module. This framework achieves effective multimodal fusion by disentangling cross-modal dependencies while preserving unique characteristics. Evaluations on two real-world datasets demonstrate superior performance over baseline models, validating the efficacy of multimodal integration for spatial–temporal air quality forecasting.
期刊介绍:
Environmental Modelling & Software publishes contributions, in the form of research articles, reviews and short communications, on recent advances in environmental modelling and/or software. The aim is to improve our capacity to represent, understand, predict or manage the behaviour of environmental systems at all practical scales, and to communicate those improvements to a wide scientific and professional audience.