Siti Raihanah Abdani, M. A. Zulkifley, Muhammad Nazir Siham, Nurshafiza Zanal Abiddin, N. A. Aziz
{"title":"Paddy Fields Segmentation using Fully Convolutional Network with Pyramid Pooling Module","authors":"Siti Raihanah Abdani, M. A. Zulkifley, Muhammad Nazir Siham, Nurshafiza Zanal Abiddin, N. A. Aziz","doi":"10.1109/ISTT50966.2020.9279341","DOIUrl":null,"url":null,"abstract":"One of the initiatives by the Malaysian government to reduce foreign dependency on staple food stock is by giving subsidies to the rice farmers. The amount received by them directly correlates with the cultivated paddy areas that include subsistence in fertilizers, seeds, and machinery. Hence, it is important for the Malaysian government to identify the exact areas that have been cultivated so that the subsidies will reach the targeted groups correctly. Currently, the surveying process is done manually by filed observer, which is a costly and tedious process. Hence, a remote sensing approach is proposed for an automated surveying system that semantically segments the satellite images of the paddy fields according to the intended class. A deep learning approach is adopted where a fully convolutional network with spatial pyramid pooling (SPP) module is designed to segment the images into four types of class, which are cultivated areas, uncultivated areas, backgrounds, and others. The encoder backbone of the network is based on VGG16, where the SPP module is comprised of four parallel branches of multiscale feature maps. The up-sample process is done through two layers of transposed convolution, where the output will be resized back according to the input image. The results show that the proposed network with SPP kernel set of 4x4, 5x5, 6x6, and 7x7 returns the best performance with a mean accuracy of 0.9869 and Jaccard index of 0.8326. The model faced its biggest training challenge when the clouds obstructed the surface information, which makes the areas uninformative. In the future, the network can be further improved by adding feed-forward layers and residual skip connections that help in reducing the zero gradient diminishing problem.","PeriodicalId":345344,"journal":{"name":"2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISTT50966.2020.9279341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
One of the initiatives by the Malaysian government to reduce foreign dependency on staple food stock is by giving subsidies to the rice farmers. The amount received by them directly correlates with the cultivated paddy areas that include subsistence in fertilizers, seeds, and machinery. Hence, it is important for the Malaysian government to identify the exact areas that have been cultivated so that the subsidies will reach the targeted groups correctly. Currently, the surveying process is done manually by filed observer, which is a costly and tedious process. Hence, a remote sensing approach is proposed for an automated surveying system that semantically segments the satellite images of the paddy fields according to the intended class. A deep learning approach is adopted where a fully convolutional network with spatial pyramid pooling (SPP) module is designed to segment the images into four types of class, which are cultivated areas, uncultivated areas, backgrounds, and others. The encoder backbone of the network is based on VGG16, where the SPP module is comprised of four parallel branches of multiscale feature maps. The up-sample process is done through two layers of transposed convolution, where the output will be resized back according to the input image. The results show that the proposed network with SPP kernel set of 4x4, 5x5, 6x6, and 7x7 returns the best performance with a mean accuracy of 0.9869 and Jaccard index of 0.8326. The model faced its biggest training challenge when the clouds obstructed the surface information, which makes the areas uninformative. In the future, the network can be further improved by adding feed-forward layers and residual skip connections that help in reducing the zero gradient diminishing problem.