{"title":"The illusion of success: Test set disproportion causes inflated accuracy in remote sensing mapping research","authors":"Yuanjun Xiao , Zhen Zhao , Jingfeng Huang , Ran Huang , Wei Weng , Gerui Liang , Chang Zhou , Qi Shao , Qiyu Tian","doi":"10.1016/j.jag.2024.104256","DOIUrl":null,"url":null,"abstract":"<div><div>In remote sensing mapping studies, selecting an appropriate test set to accurately evaluate the results is critical. An imprecise accuracy assessment can be misleading and fail to validate the applicability of mapping products. Commencing with the WHU-Hi-HanChuan dataset, this paper revealed the impact of sample size ratios in test sets on accuracy metrics by generating a series of test sets with varying ratios of positive and negative sample size to evaluate the same map. A rigorous approach for accuracy assessment was suggested, and an example of tea plantations mapping is used to demonstrate the process and analyse potential issues in traditional approaches. A scale factor (<span><math><mi>λ</mi></math></span>) was constructed to measure the discrepancy in sample size ratios between test sets and actual conditions. Accuracy adjustment formulas were developed and applied to adjust the accuracy of 42 previous maps based on the <span><math><mi>λ</mi></math></span>. Results showed a higher ratio of positive to negative sample size in test set led to inflated user’s accuracy (UA), F1-score (F1) and overall accuracy (OA), but had little impact on producer’s accuracy. When the ratio aligned with that in the target area, the UA, F1, and OA closely matched the true values, indicating the proportion of positive and negative samples in test set should be consistent with that in actual situation. The accuracies reported by the traditional approaches including test set sampling from labelled data and 5-fold cross validation were far from the true accuracy and could not reflect the performance of the map. Among 42 previous maps, nearly 60% of the maps had UAs overestimated by 10%, and 9.5% of the maps had UAs and F1s deviations of more than 25%. The conclusions of this study provide a clear caution for future mapping research and assist in producing and identifying truly excellent maps.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"135 ","pages":"Article 104256"},"PeriodicalIF":7.6000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843224006125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0
Abstract
In remote sensing mapping studies, selecting an appropriate test set to accurately evaluate the results is critical. An imprecise accuracy assessment can be misleading and fail to validate the applicability of mapping products. Commencing with the WHU-Hi-HanChuan dataset, this paper revealed the impact of sample size ratios in test sets on accuracy metrics by generating a series of test sets with varying ratios of positive and negative sample size to evaluate the same map. A rigorous approach for accuracy assessment was suggested, and an example of tea plantations mapping is used to demonstrate the process and analyse potential issues in traditional approaches. A scale factor () was constructed to measure the discrepancy in sample size ratios between test sets and actual conditions. Accuracy adjustment formulas were developed and applied to adjust the accuracy of 42 previous maps based on the . Results showed a higher ratio of positive to negative sample size in test set led to inflated user’s accuracy (UA), F1-score (F1) and overall accuracy (OA), but had little impact on producer’s accuracy. When the ratio aligned with that in the target area, the UA, F1, and OA closely matched the true values, indicating the proportion of positive and negative samples in test set should be consistent with that in actual situation. The accuracies reported by the traditional approaches including test set sampling from labelled data and 5-fold cross validation were far from the true accuracy and could not reflect the performance of the map. Among 42 previous maps, nearly 60% of the maps had UAs overestimated by 10%, and 9.5% of the maps had UAs and F1s deviations of more than 25%. The conclusions of this study provide a clear caution for future mapping research and assist in producing and identifying truly excellent maps.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.