比较从地表参数提取人为地貌特征的UNet配置。

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-06-10 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0325904

Sarah Farhadpour, Aaron E Maxwell

{"title":"比较从地表参数提取人为地貌特征的UNet配置。","authors":"Sarah Farhadpour, Aaron E Maxwell","doi":"10.1371/journal.pone.0325904","DOIUrl":null,"url":null,"abstract":"The application of deep learning for semantic segmentation has revolutionized image analysis, particularly in the geospatial and medical fields. UNet, an encoder-decoder architecture, has been suggested to be particularly effective. However, limitations such as small sample sizes and class imbalance in anthropogenic geomorphic feature extraction tasks have necessitated the exploration of advanced modifications to improve model performance. This study investigates a variety of architectural modifications to base UNet including replacing the rectified linear unit (ReLU) activation function with leaky ReLU or swish; incorporating residual connections within the encoder blocks, decoder blocks, and bottleneck; inserting squeeze and excitation modules into the encoder or attention gate modules along the skip connections; replacing the default bottleneck layer with one that incorporates dilated convolution; and using a MobileNetV2 architecture as an encoder backbone. Unique geomorphic datasets derived from high spatial resolution lidar data were used to evaluate the performance of these modified UNet architectures on the tasks of mapping agricultural terraces, mine benches, and valley fill faces. The results were further analyzed across varying training sample sizes (50, 100, 250, 500, and the full training set). Our results suggest that the incorporation of advanced modules can enhance segmentation performance, particularly in scenarios involving limited training data or complex geomorphic landscapes. However, differences were minimal when larger training set sizes were used (e.g., above 500 image chips) and the base UNet architecture was generally adequate. This research contributes valuable insights into the optimization of UNet-based models for anthropogenic geomorphic feature extraction and provides a foundation for future work aimed at improving the accuracy and efficiency of deep learning approaches in geospatial applications. We argue that one of the positive attributes of UNet is that it can be treated as a general framework that can easily be modified.","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0325904"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12151443/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparing UNet configurations for anthropogenic geomorphic feature extraction from land surface parameters.\",\"authors\":\"Sarah Farhadpour, Aaron E Maxwell\",\"doi\":\"10.1371/journal.pone.0325904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The application of deep learning for semantic segmentation has revolutionized image analysis, particularly in the geospatial and medical fields. UNet, an encoder-decoder architecture, has been suggested to be particularly effective. However, limitations such as small sample sizes and class imbalance in anthropogenic geomorphic feature extraction tasks have necessitated the exploration of advanced modifications to improve model performance. This study investigates a variety of architectural modifications to base UNet including replacing the rectified linear unit (ReLU) activation function with leaky ReLU or swish; incorporating residual connections within the encoder blocks, decoder blocks, and bottleneck; inserting squeeze and excitation modules into the encoder or attention gate modules along the skip connections; replacing the default bottleneck layer with one that incorporates dilated convolution; and using a MobileNetV2 architecture as an encoder backbone. Unique geomorphic datasets derived from high spatial resolution lidar data were used to evaluate the performance of these modified UNet architectures on the tasks of mapping agricultural terraces, mine benches, and valley fill faces. The results were further analyzed across varying training sample sizes (50, 100, 250, 500, and the full training set). Our results suggest that the incorporation of advanced modules can enhance segmentation performance, particularly in scenarios involving limited training data or complex geomorphic landscapes. However, differences were minimal when larger training set sizes were used (e.g., above 500 image chips) and the base UNet architecture was generally adequate. This research contributes valuable insights into the optimization of UNet-based models for anthropogenic geomorphic feature extraction and provides a foundation for future work aimed at improving the accuracy and efficiency of deep learning approaches in geospatial applications. We argue that one of the positive attributes of UNet is that it can be treated as a general framework that can easily be modified.\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 6\",\"pages\":\"e0325904\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12151443/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0325904\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0325904","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

深度学习在语义分割中的应用已经彻底改变了图像分析，特别是在地理空间和医学领域。UNet，一个编码器-解码器架构，被认为是特别有效的。然而，人为地貌学特征提取任务的样本量小、类别不平衡等局限性，需要探索改进的方法来提高模型的性能。本研究探讨了基础UNet的各种结构修改，包括用漏式ReLU或swish取代整流线性单元（ReLU）激活函数；在编码器块、解码器块和瓶颈内合并剩余连接；将挤压和激励模块沿跳跃连接插入编码器或注意门模块；将默认瓶颈层替换为包含扩展卷积的瓶颈层；并使用MobileNetV2架构作为编码器骨干。利用来自高空间分辨率激光雷达数据的独特地貌数据集来评估这些改进的UNet架构在绘制农业梯田、矿山台地和山谷填方面的任务中的性能。在不同的训练样本量（50、100、250、500和完整的训练集）下进一步分析结果。我们的研究结果表明，结合先进的模块可以提高分割性能，特别是在涉及有限的训练数据或复杂的地貌景观的场景中。然而，当使用更大的训练集大小时（例如，超过500个图像芯片），基本UNet架构通常是足够的，差异是最小的。该研究为优化基于unet的人为地貌特征提取模型提供了有价值的见解，并为提高深度学习方法在地理空间应用中的准确性和效率奠定了基础。我们认为，UNet的一个积极属性是，它可以被视为一个通用框架，可以很容易地修改。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparing UNet configurations for anthropogenic geomorphic feature extraction from land surface parameters.

The application of deep learning for semantic segmentation has revolutionized image analysis, particularly in the geospatial and medical fields. UNet, an encoder-decoder architecture, has been suggested to be particularly effective. However, limitations such as small sample sizes and class imbalance in anthropogenic geomorphic feature extraction tasks have necessitated the exploration of advanced modifications to improve model performance. This study investigates a variety of architectural modifications to base UNet including replacing the rectified linear unit (ReLU) activation function with leaky ReLU or swish; incorporating residual connections within the encoder blocks, decoder blocks, and bottleneck; inserting squeeze and excitation modules into the encoder or attention gate modules along the skip connections; replacing the default bottleneck layer with one that incorporates dilated convolution; and using a MobileNetV2 architecture as an encoder backbone. Unique geomorphic datasets derived from high spatial resolution lidar data were used to evaluate the performance of these modified UNet architectures on the tasks of mapping agricultural terraces, mine benches, and valley fill faces. The results were further analyzed across varying training sample sizes (50, 100, 250, 500, and the full training set). Our results suggest that the incorporation of advanced modules can enhance segmentation performance, particularly in scenarios involving limited training data or complex geomorphic landscapes. However, differences were minimal when larger training set sizes were used (e.g., above 500 image chips) and the base UNet architecture was generally adequate. This research contributes valuable insights into the optimization of UNet-based models for anthropogenic geomorphic feature extraction and provides a foundation for future work aimed at improving the accuracy and efficiency of deep learning approaches in geospatial applications. We argue that one of the positive attributes of UNet is that it can be treated as a general framework that can easily be modified.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage