多粒度制导扩散用于量控遥感目标生成

Zhiping Yu;Chenyang Liu;Chuyu Zhong;Zhengxia Zou;Zhenwei Shi
{"title":"多粒度制导扩散用于量控遥感目标生成","authors":"Zhiping Yu;Chenyang Liu;Chuyu Zhong;Zhengxia Zou;Zhenwei Shi","doi":"10.1109/LGRS.2025.3565817","DOIUrl":null,"url":null,"abstract":"Accurate object counts represent essential semantical information in remote sensing imagery, significantly impacting applications such as traffic monitoring and urban planning. Despite the recent advances in text-to-image (T2I) generation in remote sensing, existing methods still face challenges in precisely controlling the number of object instances in generated images. To address this challenge, we propose a novel method, multi-grained guided diffusion (MGDiff). During training, unlike previous methods that relied solely on latent-space noise constraints, MGDiff imposes constraints at three distinct granularities: latent pixel, global counting, and spatial distribution. The multi-grained guidance mechanism matches the quantity prompts with object spatial layouts in the feature space, enabling our model to achieve precise control over object quantities. To benchmark this new task, we present Levir-QCG, a dataset comprising 10504 remote sensing images across five object categories, annotated with precise object counts and segmentation masks. We conducted extensive experiments to benchmark our method against previous methods on the Levir-QCG dataset. Compared to previous models, the MGDiff achieves an approximately +40% improvement in counting accuracy while maintaining higher visual fidelity and strong zero-shot generalization. To the best of our knowledge, this is the first work to research accurate object quantity control in remote sensing T2I generation. The dataset and code will be publicly available at <uri>https://github.com/YZPioneer/MGDiff</uri>","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Grained Guided Diffusion for Quantity-Controlled Remote Sensing Object Generation\",\"authors\":\"Zhiping Yu;Chenyang Liu;Chuyu Zhong;Zhengxia Zou;Zhenwei Shi\",\"doi\":\"10.1109/LGRS.2025.3565817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate object counts represent essential semantical information in remote sensing imagery, significantly impacting applications such as traffic monitoring and urban planning. Despite the recent advances in text-to-image (T2I) generation in remote sensing, existing methods still face challenges in precisely controlling the number of object instances in generated images. To address this challenge, we propose a novel method, multi-grained guided diffusion (MGDiff). During training, unlike previous methods that relied solely on latent-space noise constraints, MGDiff imposes constraints at three distinct granularities: latent pixel, global counting, and spatial distribution. The multi-grained guidance mechanism matches the quantity prompts with object spatial layouts in the feature space, enabling our model to achieve precise control over object quantities. To benchmark this new task, we present Levir-QCG, a dataset comprising 10504 remote sensing images across five object categories, annotated with precise object counts and segmentation masks. We conducted extensive experiments to benchmark our method against previous methods on the Levir-QCG dataset. Compared to previous models, the MGDiff achieves an approximately +40% improvement in counting accuracy while maintaining higher visual fidelity and strong zero-shot generalization. To the best of our knowledge, this is the first work to research accurate object quantity control in remote sensing T2I generation. The dataset and code will be publicly available at <uri>https://github.com/YZPioneer/MGDiff</uri>\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10980342/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10980342/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

准确的目标计数是遥感图像中必不可少的语义信息,对交通监控和城市规划等应用产生重大影响。尽管近年来在遥感中的文本到图像(tt2i)生成方面取得了进展,但现有方法在精确控制生成图像中的目标实例数量方面仍然面临挑战。为了解决这一挑战,我们提出了一种新的方法,多粒度引导扩散(MGDiff)。在训练过程中,与以前的方法完全依赖于潜在空间噪声约束不同,MGDiff在三个不同的粒度上施加约束:潜在像素、全局计数和空间分布。多粒度引导机制将数量提示与特征空间中的对象空间布局相匹配,使模型能够实现对对象数量的精确控制。为了对这项新任务进行基准测试,我们提出了Levir-QCG,这是一个包含5个对象类别的10504个遥感图像的数据集,用精确的对象计数和分割掩码进行了注释。我们进行了大量的实验,将我们的方法与以前的方法在Levir-QCG数据集上进行基准测试。与以前的型号相比,MGDiff在计数精度方面提高了约40%,同时保持了更高的视觉保真度和强大的零射击泛化。据我们所知,这是第一个研究遥感T2I生成中精确物量控制的工作。数据集和代码将在https://github.com/YZPioneer/MGDiff上公开提供
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Grained Guided Diffusion for Quantity-Controlled Remote Sensing Object Generation
Accurate object counts represent essential semantical information in remote sensing imagery, significantly impacting applications such as traffic monitoring and urban planning. Despite the recent advances in text-to-image (T2I) generation in remote sensing, existing methods still face challenges in precisely controlling the number of object instances in generated images. To address this challenge, we propose a novel method, multi-grained guided diffusion (MGDiff). During training, unlike previous methods that relied solely on latent-space noise constraints, MGDiff imposes constraints at three distinct granularities: latent pixel, global counting, and spatial distribution. The multi-grained guidance mechanism matches the quantity prompts with object spatial layouts in the feature space, enabling our model to achieve precise control over object quantities. To benchmark this new task, we present Levir-QCG, a dataset comprising 10504 remote sensing images across five object categories, annotated with precise object counts and segmentation masks. We conducted extensive experiments to benchmark our method against previous methods on the Levir-QCG dataset. Compared to previous models, the MGDiff achieves an approximately +40% improvement in counting accuracy while maintaining higher visual fidelity and strong zero-shot generalization. To the best of our knowledge, this is the first work to research accurate object quantity control in remote sensing T2I generation. The dataset and code will be publicly available at https://github.com/YZPioneer/MGDiff
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信