Zhiping Yu;Chenyang Liu;Chuyu Zhong;Zhengxia Zou;Zhenwei Shi
{"title":"多粒度制导扩散用于量控遥感目标生成","authors":"Zhiping Yu;Chenyang Liu;Chuyu Zhong;Zhengxia Zou;Zhenwei Shi","doi":"10.1109/LGRS.2025.3565817","DOIUrl":null,"url":null,"abstract":"Accurate object counts represent essential semantical information in remote sensing imagery, significantly impacting applications such as traffic monitoring and urban planning. Despite the recent advances in text-to-image (T2I) generation in remote sensing, existing methods still face challenges in precisely controlling the number of object instances in generated images. To address this challenge, we propose a novel method, multi-grained guided diffusion (MGDiff). During training, unlike previous methods that relied solely on latent-space noise constraints, MGDiff imposes constraints at three distinct granularities: latent pixel, global counting, and spatial distribution. The multi-grained guidance mechanism matches the quantity prompts with object spatial layouts in the feature space, enabling our model to achieve precise control over object quantities. To benchmark this new task, we present Levir-QCG, a dataset comprising 10504 remote sensing images across five object categories, annotated with precise object counts and segmentation masks. We conducted extensive experiments to benchmark our method against previous methods on the Levir-QCG dataset. Compared to previous models, the MGDiff achieves an approximately +40% improvement in counting accuracy while maintaining higher visual fidelity and strong zero-shot generalization. To the best of our knowledge, this is the first work to research accurate object quantity control in remote sensing T2I generation. The dataset and code will be publicly available at <uri>https://github.com/YZPioneer/MGDiff</uri>","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Grained Guided Diffusion for Quantity-Controlled Remote Sensing Object Generation\",\"authors\":\"Zhiping Yu;Chenyang Liu;Chuyu Zhong;Zhengxia Zou;Zhenwei Shi\",\"doi\":\"10.1109/LGRS.2025.3565817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate object counts represent essential semantical information in remote sensing imagery, significantly impacting applications such as traffic monitoring and urban planning. Despite the recent advances in text-to-image (T2I) generation in remote sensing, existing methods still face challenges in precisely controlling the number of object instances in generated images. To address this challenge, we propose a novel method, multi-grained guided diffusion (MGDiff). During training, unlike previous methods that relied solely on latent-space noise constraints, MGDiff imposes constraints at three distinct granularities: latent pixel, global counting, and spatial distribution. The multi-grained guidance mechanism matches the quantity prompts with object spatial layouts in the feature space, enabling our model to achieve precise control over object quantities. To benchmark this new task, we present Levir-QCG, a dataset comprising 10504 remote sensing images across five object categories, annotated with precise object counts and segmentation masks. We conducted extensive experiments to benchmark our method against previous methods on the Levir-QCG dataset. Compared to previous models, the MGDiff achieves an approximately +40% improvement in counting accuracy while maintaining higher visual fidelity and strong zero-shot generalization. To the best of our knowledge, this is the first work to research accurate object quantity control in remote sensing T2I generation. The dataset and code will be publicly available at <uri>https://github.com/YZPioneer/MGDiff</uri>\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10980342/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10980342/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Grained Guided Diffusion for Quantity-Controlled Remote Sensing Object Generation
Accurate object counts represent essential semantical information in remote sensing imagery, significantly impacting applications such as traffic monitoring and urban planning. Despite the recent advances in text-to-image (T2I) generation in remote sensing, existing methods still face challenges in precisely controlling the number of object instances in generated images. To address this challenge, we propose a novel method, multi-grained guided diffusion (MGDiff). During training, unlike previous methods that relied solely on latent-space noise constraints, MGDiff imposes constraints at three distinct granularities: latent pixel, global counting, and spatial distribution. The multi-grained guidance mechanism matches the quantity prompts with object spatial layouts in the feature space, enabling our model to achieve precise control over object quantities. To benchmark this new task, we present Levir-QCG, a dataset comprising 10504 remote sensing images across five object categories, annotated with precise object counts and segmentation masks. We conducted extensive experiments to benchmark our method against previous methods on the Levir-QCG dataset. Compared to previous models, the MGDiff achieves an approximately +40% improvement in counting accuracy while maintaining higher visual fidelity and strong zero-shot generalization. To the best of our knowledge, this is the first work to research accurate object quantity control in remote sensing T2I generation. The dataset and code will be publicly available at https://github.com/YZPioneer/MGDiff