Yaxian Lei;Xiaochong Tong;Chunping Qiu;Haoshuai Song;Congzhou Guo;He Li
{"title":"Spatial-Aware Remote Sensing Image Generation From Spatial Relationship Descriptions","authors":"Yaxian Lei;Xiaochong Tong;Chunping Qiu;Haoshuai Song;Congzhou Guo;He Li","doi":"10.1109/LGRS.2025.3542169","DOIUrl":null,"url":null,"abstract":"Recent advances in stable diffusion models have revolutionized text-to-image generation. However, these models struggle with spatial relationship comprehension in remote sensing (RS) scenarios, limiting their ability to generate spatially accurate imagery. We present a novel framework for generating RS images from spatial relationship descriptions with precise spatial control. Our approach introduces a two-stage pipeline: first, a spatial relationship semantic structuring model converts formalized spatial relationship descriptions into controlled layouts, and second, an enhanced diffusion model incorporates positional prompts and a layout attention mechanism to generate the final image. The positional prompts explicitly encode spatial information, while the layout attention mechanism enables focused region learning. Comprehensive experiments demonstrate that our method achieves superior performance compared with state-of-the-art approaches in both spatial accuracy and image quality.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10887344/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advances in stable diffusion models have revolutionized text-to-image generation. However, these models struggle with spatial relationship comprehension in remote sensing (RS) scenarios, limiting their ability to generate spatially accurate imagery. We present a novel framework for generating RS images from spatial relationship descriptions with precise spatial control. Our approach introduces a two-stage pipeline: first, a spatial relationship semantic structuring model converts formalized spatial relationship descriptions into controlled layouts, and second, an enhanced diffusion model incorporates positional prompts and a layout attention mechanism to generate the final image. The positional prompts explicitly encode spatial information, while the layout attention mechanism enables focused region learning. Comprehensive experiments demonstrate that our method achieves superior performance compared with state-of-the-art approaches in both spatial accuracy and image quality.