超参数对deep plabv3 + RGB图像水体分割性能的影响

Q2 Social Sciences

The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences Pub Date : 2023-09-05 DOI:10.5194/isprs-archives-xlviii-m-3-2023-203-2023

Onteddu Chaitanya Reddy, Illa Dinesh Kumar, Pingali Sathvika, Sajith Variyar, Sowmya, R. Sivanpillai

{"title":"超参数对deep plabv3 + RGB图像水体分割性能的影响","authors":"Onteddu Chaitanya Reddy, Illa Dinesh Kumar, Pingali Sathvika, Sajith Variyar, Sowmya, R. Sivanpillai","doi":"10.5194/isprs-archives-xlviii-m-3-2023-203-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and corresponding masks that identify target features in them. DL networks learn by iteratively adjusting the weights of interconnected layers using backpropagation, a process that involves calculating gradients and minimizing a loss function. This allows the network to learn patterns and relationships in the data, enabling it to make predictions or classifications on new, unseen data. Training any DL network requires specifying values of the hyperparameters such as input image size, batch size, and number of epochs among others. Failure to specify optimal values for the parameters will increase the training time or result in incomplete learning. The rationale of this study was to evaluate the effect of input image and batch sizes on the performance of DeepLabV3+ using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We trained DeepLabV3+ network six times with two sets of input images of 128 × 128-pixel, and 256 × 256-pixel dimensions with 4, 8 and 16 batch sizes. The model is trained for 100 epochs to ensure that the loss plot reaches saturation and the model converged to a stable solution. Predicted masks generated by each model were compared to their corresponding test mask images based on accuracy, precision, recall and F1 scores. Results from this study demonstrated that image size of 256 × 256 and batch size 4 achieved highest performance. It can also be inferred that larger input image size improved DeepLabV3+ model performance.\n","PeriodicalId":30634,"journal":{"name":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EFFECT OF HYPERPARAMETERS ON DEEPLABV3+ PERFORMANCE TO SEGMENT WATER BODIES IN RGB IMAGES\",\"authors\":\"Onteddu Chaitanya Reddy, Illa Dinesh Kumar, Pingali Sathvika, Sajith Variyar, Sowmya, R. Sivanpillai\",\"doi\":\"10.5194/isprs-archives-xlviii-m-3-2023-203-2023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract. Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and corresponding masks that identify target features in them. DL networks learn by iteratively adjusting the weights of interconnected layers using backpropagation, a process that involves calculating gradients and minimizing a loss function. This allows the network to learn patterns and relationships in the data, enabling it to make predictions or classifications on new, unseen data. Training any DL network requires specifying values of the hyperparameters such as input image size, batch size, and number of epochs among others. Failure to specify optimal values for the parameters will increase the training time or result in incomplete learning. The rationale of this study was to evaluate the effect of input image and batch sizes on the performance of DeepLabV3+ using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We trained DeepLabV3+ network six times with two sets of input images of 128 × 128-pixel, and 256 × 256-pixel dimensions with 4, 8 and 16 batch sizes. The model is trained for 100 epochs to ensure that the loss plot reaches saturation and the model converged to a stable solution. Predicted masks generated by each model were compared to their corresponding test mask images based on accuracy, precision, recall and F1 scores. Results from this study demonstrated that image size of 256 × 256 and batch size 4 achieved highest performance. It can also be inferred that larger input image size improved DeepLabV3+ model performance.\\n\",\"PeriodicalId\":30634,\"journal\":{\"name\":\"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5194/isprs-archives-xlviii-m-3-2023-203-2023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/isprs-archives-xlviii-m-3-2023-203-2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

摘要

摘要用于图像分割任务的深度学习(DL)网络必须使用输入图像和相应的掩码进行训练，以识别其中的目标特征。深度学习网络通过使用反向传播迭代调整互连层的权重来学习，这一过程涉及计算梯度和最小化损失函数。这使得网络能够学习数据中的模式和关系，使其能够对新的、未见过的数据进行预测或分类。训练任何深度学习网络都需要指定超参数的值，如输入图像大小、批处理大小和epoch数量等。如果不能指定参数的最优值，则会增加训练时间或导致学习不完全。本研究的基本原理是使用Sentinel 2 A/B RGB图像和从Kaggle获得的标签来评估输入图像和批大小对DeepLabV3+性能的影响。我们对DeepLabV3+网络进行了6次训练，输入图像尺寸分别为128 × 128像素和256 × 256像素，batch size分别为4、8和16。对模型进行100次epoch的训练，以保证损失图达到饱和，模型收敛到稳定解。根据准确率、精密度、召回率和F1分数，将每个模型生成的预测掩模与相应的测试掩模图像进行比较。研究结果表明，图像大小为256 × 256和批处理大小为4时，获得了最高的性能。也可以推断，更大的输入图像尺寸提高了DeepLabV3+模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

EFFECT OF HYPERPARAMETERS ON DEEPLABV3+ PERFORMANCE TO SEGMENT WATER BODIES IN RGB IMAGES

Abstract. Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and corresponding masks that identify target features in them. DL networks learn by iteratively adjusting the weights of interconnected layers using backpropagation, a process that involves calculating gradients and minimizing a loss function. This allows the network to learn patterns and relationships in the data, enabling it to make predictions or classifications on new, unseen data. Training any DL network requires specifying values of the hyperparameters such as input image size, batch size, and number of epochs among others. Failure to specify optimal values for the parameters will increase the training time or result in incomplete learning. The rationale of this study was to evaluate the effect of input image and batch sizes on the performance of DeepLabV3+ using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We trained DeepLabV3+ network six times with two sets of input images of 128 × 128-pixel, and 256 × 256-pixel dimensions with 4, 8 and 16 batch sizes. The model is trained for 100 epochs to ensure that the loss plot reaches saturation and the model converged to a stable solution. Predicted masks generated by each model were compared to their corresponding test mask images based on accuracy, precision, recall and F1 scores. Results from this study demonstrated that image size of 256 × 256 and batch size 4 achieved highest performance. It can also be inferred that larger input image size improved DeepLabV3+ model performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences Social Sciences-Geography, Planning and Development

CiteScore

1.70

自引率

0.00%

发文量

949

审稿时长

16 weeks