Yangyang Fu , Ruoque Shen , Chaoqing Song , Jie Dong , Wei Han , Tao Ye , Wenping Yuan
{"title":"Exploring the effects of training samples on the accuracy of crop mapping with machine learning algorithm","authors":"Yangyang Fu , Ruoque Shen , Chaoqing Song , Jie Dong , Wei Han , Tao Ye , Wenping Yuan","doi":"10.1016/j.srs.2023.100081","DOIUrl":null,"url":null,"abstract":"<div><p>Machine learning algorithms are a frequently used crop classification method and have been applied to identify the distribution of various crops over regional and national scales. Previous studies have underscored that the number of training samples strongly influences the classification accuracy of machine learning algorithms, resulting in extensive training sample collection efforts. This study, taking winter wheat as an example, challenges the above principle by selecting training samples with the time-weighted dynamic time warping (TWDTW) method and finds that the classification accuracy of machine learning algorithms highly relies on the representativeness and proportion of training samples rather than the quantity. With the increase of the representativeness of training samples, i.e. more comprehensively reflected the characteristics of winter wheat, the classification accuracy is continually improved. The best classification accuracy is further achieved when selecting the training samples of winter wheat and non-winter wheat according to the ratio of their statistical areas. On the contrary, only a slight difference was found in overall accuracy (91.26% and 90.74%), producer’s accuracy (86.33% and 86.65%) and user’s accuracy (97.37% and 96.01%) when using 1,000 and 10,000 training samples. Overall, this study demonstrates that the characteristics of training samples have a great impact on the classification accuracy of machine learning algorithms, and the training samples generated by TWDTW method are reliable for crop mapping.</p></div>","PeriodicalId":101147,"journal":{"name":"Science of Remote Sensing","volume":"7 ","pages":"Article 100081"},"PeriodicalIF":5.7000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of Remote Sensing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666017223000068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 1
Abstract
Machine learning algorithms are a frequently used crop classification method and have been applied to identify the distribution of various crops over regional and national scales. Previous studies have underscored that the number of training samples strongly influences the classification accuracy of machine learning algorithms, resulting in extensive training sample collection efforts. This study, taking winter wheat as an example, challenges the above principle by selecting training samples with the time-weighted dynamic time warping (TWDTW) method and finds that the classification accuracy of machine learning algorithms highly relies on the representativeness and proportion of training samples rather than the quantity. With the increase of the representativeness of training samples, i.e. more comprehensively reflected the characteristics of winter wheat, the classification accuracy is continually improved. The best classification accuracy is further achieved when selecting the training samples of winter wheat and non-winter wheat according to the ratio of their statistical areas. On the contrary, only a slight difference was found in overall accuracy (91.26% and 90.74%), producer’s accuracy (86.33% and 86.65%) and user’s accuracy (97.37% and 96.01%) when using 1,000 and 10,000 training samples. Overall, this study demonstrates that the characteristics of training samples have a great impact on the classification accuracy of machine learning algorithms, and the training samples generated by TWDTW method are reliable for crop mapping.