Stable rivers: A case study in the application of text-to-image generative models for Earth sciences

IF 2.8 3区地球科学 Q2 GEOGRAPHY, PHYSICAL

Earth Surface Processes and Landforms Pub Date : 2024-08-20 DOI:10.1002/esp.5961

C. Kupferschmidt, A.D. Binns, K.L. Kupferschmidt, G.W. Taylor

{"title":"Stable rivers: A case study in the application of text-to-image generative models for Earth sciences","authors":"C. Kupferschmidt, A.D. Binns, K.L. Kupferschmidt, G.W. Taylor","doi":"10.1002/esp.5961","DOIUrl":null,"url":null,"abstract":"<p>Text-to-image (TTI) generative models can be used to generate photorealistic images from a given text-string input. However, the rapid increase in their use has raised questions about fairness and biases, with most research to date focusing on social and cultural areas rather than domain-specific considerations. We conducted a case study for the Earth sciences, focusing on the field of fluvial geomorphology, where we evaluated subject-area-specific biases in the training data and downstream model performance of Stable Diffusion (v1.5). In addition to perpetuating Western biases, we found that the training data overrepresented scenic locations, such as famous rivers and waterfalls, and showed serious underrepresentation and overrepresentation of many morphological and environmental terms. Despite biassed training data, we found that with careful prompting, the Stable Diffusion model was able to generate photorealistic synthetic river images reproducing many important environmental and morphological characteristics. Furthermore, conditional control techniques, such as the use of condition maps with ControlNet, were effective for providing additional constraints on output images. Despite great potential for the use of TTI models in the Earth sciences field, we advocate for caution in sensitive applications and advocate for domain-specific reviews of training data and image generation biases to mitigate perpetuation of existing biases.</p>","PeriodicalId":11408,"journal":{"name":"Earth Surface Processes and Landforms","volume":"49 13","pages":"4213-4232"},"PeriodicalIF":2.8000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/esp.5961","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Surface Processes and Landforms","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/esp.5961","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Text-to-image (TTI) generative models can be used to generate photorealistic images from a given text-string input. However, the rapid increase in their use has raised questions about fairness and biases, with most research to date focusing on social and cultural areas rather than domain-specific considerations. We conducted a case study for the Earth sciences, focusing on the field of fluvial geomorphology, where we evaluated subject-area-specific biases in the training data and downstream model performance of Stable Diffusion (v1.5). In addition to perpetuating Western biases, we found that the training data overrepresented scenic locations, such as famous rivers and waterfalls, and showed serious underrepresentation and overrepresentation of many morphological and environmental terms. Despite biassed training data, we found that with careful prompting, the Stable Diffusion model was able to generate photorealistic synthetic river images reproducing many important environmental and morphological characteristics. Furthermore, conditional control techniques, such as the use of condition maps with ControlNet, were effective for providing additional constraints on output images. Despite great potential for the use of TTI models in the Earth sciences field, we advocate for caution in sensitive applications and advocate for domain-specific reviews of training data and image generation biases to mitigate perpetuation of existing biases.

Abstract Image

查看原文本刊更多论文

稳定的河流：文本到图像生成模型在地球科学中的应用案例研究

摘要文本到图像（TTI）生成模型可用于根据给定的文本字符串输入生成逼真的图像。然而，其使用量的迅速增加引发了有关公平性和偏差的问题，迄今为止，大多数研究都侧重于社会和文化领域，而不是特定领域的考虑因素。我们针对地球科学领域开展了一项案例研究，重点关注河流地貌学领域，评估了稳定扩散（v1.5 版）训练数据和下游模型性能中特定学科领域的偏差。除了延续西方的偏差外，我们还发现训练数据过多地代表了风景名胜，如著名河流和瀑布，并对许多形态和环境术语表现出严重的代表性不足和代表性过剩。尽管训练数据存在偏差，但我们发现，通过仔细提示，稳定扩散模型能够生成逼真的合成河流图像，再现许多重要的环境和形态特征。此外，条件控制技术（如使用 ControlNet 的条件图）也能有效地为输出图像提供额外的约束条件。尽管 TTI 模型在地球科学领域具有巨大的应用潜力，但我们仍建议在敏感应用中谨慎使用，并主张对特定领域的训练数据和图像生成偏差进行审查，以减少现有偏差的长期存在。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Earth Surface Processes and Landforms 地学-地球科学综合

CiteScore

6.40

自引率

12.10%

发文量

215

审稿时长

4 months

期刊介绍： Earth Surface Processes and Landforms is an interdisciplinary international journal concerned with: the interactions between surface processes and landforms and landscapes; that lead to physical, chemical and biological changes; and which in turn create; current landscapes and the geological record of past landscapes. Its focus is core to both physical geographical and geological communities, and also the wider geosciences