{"title":"比较从地表参数提取人为地貌特征的UNet配置。","authors":"Sarah Farhadpour, Aaron E Maxwell","doi":"10.1371/journal.pone.0325904","DOIUrl":null,"url":null,"abstract":"<p><p>The application of deep learning for semantic segmentation has revolutionized image analysis, particularly in the geospatial and medical fields. UNet, an encoder-decoder architecture, has been suggested to be particularly effective. However, limitations such as small sample sizes and class imbalance in anthropogenic geomorphic feature extraction tasks have necessitated the exploration of advanced modifications to improve model performance. This study investigates a variety of architectural modifications to base UNet including replacing the rectified linear unit (ReLU) activation function with leaky ReLU or swish; incorporating residual connections within the encoder blocks, decoder blocks, and bottleneck; inserting squeeze and excitation modules into the encoder or attention gate modules along the skip connections; replacing the default bottleneck layer with one that incorporates dilated convolution; and using a MobileNetV2 architecture as an encoder backbone. Unique geomorphic datasets derived from high spatial resolution lidar data were used to evaluate the performance of these modified UNet architectures on the tasks of mapping agricultural terraces, mine benches, and valley fill faces. The results were further analyzed across varying training sample sizes (50, 100, 250, 500, and the full training set). Our results suggest that the incorporation of advanced modules can enhance segmentation performance, particularly in scenarios involving limited training data or complex geomorphic landscapes. However, differences were minimal when larger training set sizes were used (e.g., above 500 image chips) and the base UNet architecture was generally adequate. This research contributes valuable insights into the optimization of UNet-based models for anthropogenic geomorphic feature extraction and provides a foundation for future work aimed at improving the accuracy and efficiency of deep learning approaches in geospatial applications. We argue that one of the positive attributes of UNet is that it can be treated as a general framework that can easily be modified.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 6","pages":"e0325904"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12151443/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparing UNet configurations for anthropogenic geomorphic feature extraction from land surface parameters.\",\"authors\":\"Sarah Farhadpour, Aaron E Maxwell\",\"doi\":\"10.1371/journal.pone.0325904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The application of deep learning for semantic segmentation has revolutionized image analysis, particularly in the geospatial and medical fields. UNet, an encoder-decoder architecture, has been suggested to be particularly effective. However, limitations such as small sample sizes and class imbalance in anthropogenic geomorphic feature extraction tasks have necessitated the exploration of advanced modifications to improve model performance. This study investigates a variety of architectural modifications to base UNet including replacing the rectified linear unit (ReLU) activation function with leaky ReLU or swish; incorporating residual connections within the encoder blocks, decoder blocks, and bottleneck; inserting squeeze and excitation modules into the encoder or attention gate modules along the skip connections; replacing the default bottleneck layer with one that incorporates dilated convolution; and using a MobileNetV2 architecture as an encoder backbone. Unique geomorphic datasets derived from high spatial resolution lidar data were used to evaluate the performance of these modified UNet architectures on the tasks of mapping agricultural terraces, mine benches, and valley fill faces. The results were further analyzed across varying training sample sizes (50, 100, 250, 500, and the full training set). Our results suggest that the incorporation of advanced modules can enhance segmentation performance, particularly in scenarios involving limited training data or complex geomorphic landscapes. However, differences were minimal when larger training set sizes were used (e.g., above 500 image chips) and the base UNet architecture was generally adequate. This research contributes valuable insights into the optimization of UNet-based models for anthropogenic geomorphic feature extraction and provides a foundation for future work aimed at improving the accuracy and efficiency of deep learning approaches in geospatial applications. We argue that one of the positive attributes of UNet is that it can be treated as a general framework that can easily be modified.</p>\",\"PeriodicalId\":20189,\"journal\":{\"name\":\"PLoS ONE\",\"volume\":\"20 6\",\"pages\":\"e0325904\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12151443/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLoS ONE\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pone.0325904\",\"RegionNum\":3,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0325904","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Comparing UNet configurations for anthropogenic geomorphic feature extraction from land surface parameters.
The application of deep learning for semantic segmentation has revolutionized image analysis, particularly in the geospatial and medical fields. UNet, an encoder-decoder architecture, has been suggested to be particularly effective. However, limitations such as small sample sizes and class imbalance in anthropogenic geomorphic feature extraction tasks have necessitated the exploration of advanced modifications to improve model performance. This study investigates a variety of architectural modifications to base UNet including replacing the rectified linear unit (ReLU) activation function with leaky ReLU or swish; incorporating residual connections within the encoder blocks, decoder blocks, and bottleneck; inserting squeeze and excitation modules into the encoder or attention gate modules along the skip connections; replacing the default bottleneck layer with one that incorporates dilated convolution; and using a MobileNetV2 architecture as an encoder backbone. Unique geomorphic datasets derived from high spatial resolution lidar data were used to evaluate the performance of these modified UNet architectures on the tasks of mapping agricultural terraces, mine benches, and valley fill faces. The results were further analyzed across varying training sample sizes (50, 100, 250, 500, and the full training set). Our results suggest that the incorporation of advanced modules can enhance segmentation performance, particularly in scenarios involving limited training data or complex geomorphic landscapes. However, differences were minimal when larger training set sizes were used (e.g., above 500 image chips) and the base UNet architecture was generally adequate. This research contributes valuable insights into the optimization of UNet-based models for anthropogenic geomorphic feature extraction and provides a foundation for future work aimed at improving the accuracy and efficiency of deep learning approaches in geospatial applications. We argue that one of the positive attributes of UNet is that it can be treated as a general framework that can easily be modified.
期刊介绍:
PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides:
* Open-access—freely accessible online, authors retain copyright
* Fast publication times
* Peer review by expert, practicing researchers
* Post-publication tools to indicate quality and impact
* Community-based dialogue on articles
* Worldwide media coverage