The illusion of success: Test set disproportion causes inflated accuracy in remote sensing mapping research

IF 7.6 Q1 REMOTE SENSING

International journal of applied earth observation and geoinformation : ITC journal Pub Date : 2024-11-16 DOI:10.1016/j.jag.2024.104256

Yuanjun Xiao , Zhen Zhao , Jingfeng Huang , Ran Huang , Wei Weng , Gerui Liang , Chang Zhou , Qi Shao , Qiyu Tian

{"title":"The illusion of success: Test set disproportion causes inflated accuracy in remote sensing mapping research","authors":"Yuanjun Xiao , Zhen Zhao , Jingfeng Huang , Ran Huang , Wei Weng , Gerui Liang , Chang Zhou , Qi Shao , Qiyu Tian","doi":"10.1016/j.jag.2024.104256","DOIUrl":null,"url":null,"abstract":"<div><div>In remote sensing mapping studies, selecting an appropriate test set to accurately evaluate the results is critical. An imprecise accuracy assessment can be misleading and fail to validate the applicability of mapping products. Commencing with the WHU-Hi-HanChuan dataset, this paper revealed the impact of sample size ratios in test sets on accuracy metrics by generating a series of test sets with varying ratios of positive and negative sample size to evaluate the same map. A rigorous approach for accuracy assessment was suggested, and an example of tea plantations mapping is used to demonstrate the process and analyse potential issues in traditional approaches. A scale factor (<span><math><mi>λ</mi></math></span>) was constructed to measure the discrepancy in sample size ratios between test sets and actual conditions. Accuracy adjustment formulas were developed and applied to adjust the accuracy of 42 previous maps based on the <span><math><mi>λ</mi></math></span>. Results showed a higher ratio of positive to negative sample size in test set led to inflated user’s accuracy (UA), F1-score (F1) and overall accuracy (OA), but had little impact on producer’s accuracy. When the ratio aligned with that in the target area, the UA, F1, and OA closely matched the true values, indicating the proportion of positive and negative samples in test set should be consistent with that in actual situation. The accuracies reported by the traditional approaches including test set sampling from labelled data and 5-fold cross validation were far from the true accuracy and could not reflect the performance of the map. Among 42 previous maps, nearly 60% of the maps had UAs overestimated by 10%, and 9.5% of the maps had UAs and F1s deviations of more than 25%. The conclusions of this study provide a clear caution for future mapping research and assist in producing and identifying truly excellent maps.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"135 ","pages":"Article 104256"},"PeriodicalIF":7.6000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843224006125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

Abstract

In remote sensing mapping studies, selecting an appropriate test set to accurately evaluate the results is critical. An imprecise accuracy assessment can be misleading and fail to validate the applicability of mapping products. Commencing with the WHU-Hi-HanChuan dataset, this paper revealed the impact of sample size ratios in test sets on accuracy metrics by generating a series of test sets with varying ratios of positive and negative sample size to evaluate the same map. A rigorous approach for accuracy assessment was suggested, and an example of tea plantations mapping is used to demonstrate the process and analyse potential issues in traditional approaches. A scale factor (

λ

) was constructed to measure the discrepancy in sample size ratios between test sets and actual conditions. Accuracy adjustment formulas were developed and applied to adjust the accuracy of 42 previous maps based on the

λ

. Results showed a higher ratio of positive to negative sample size in test set led to inflated user’s accuracy (UA), F1-score (F1) and overall accuracy (OA), but had little impact on producer’s accuracy. When the ratio aligned with that in the target area, the UA, F1, and OA closely matched the true values, indicating the proportion of positive and negative samples in test set should be consistent with that in actual situation. The accuracies reported by the traditional approaches including test set sampling from labelled data and 5-fold cross validation were far from the true accuracy and could not reflect the performance of the map. Among 42 previous maps, nearly 60% of the maps had UAs overestimated by 10%, and 9.5% of the maps had UAs and F1s deviations of more than 25%. The conclusions of this study provide a clear caution for future mapping research and assist in producing and identifying truly excellent maps.

查看原文本刊更多论文

成功的假象：测试集比例失调导致遥感测绘研究的精确度膨胀

在遥感测绘研究中，选择适当的测试集以准确评估结果至关重要。不精确的精度评估可能会产生误导，无法验证测绘产品的适用性。本文从西湖大学-汉川数据集入手，通过生成一系列正负样本量比例不同的测试集来评估同一幅地图，揭示了测试集中样本量比例对精度指标的影响。提出了一种严格的精度评估方法，并以茶园制图为例演示了这一过程，分析了传统方法中可能存在的问题。构建了一个比例因子（λ），用于衡量测试集与实际情况之间样本量比率的差异。根据 λ 制定并应用了精确度调整公式，以调整 42 幅先前地图的精确度。结果显示，测试集中正负样本量的比例越高，用户准确率（UA）、F1 分数（F1）和总体准确率（OA）就越高，但对生产者的准确率影响不大。当比例与目标区域的比例一致时，UA、F1 和 OA 与真实值非常接近，表明测试集中正负样本的比例应与实际情况一致。传统方法（包括从标记数据中抽取测试集样本和 5 倍交叉验证）所报告的准确度与真实准确度相差甚远，无法反映地图的性能。在以往的 42 幅地图中，近 60% 的地图的 UAs 高估了 10%，9.5% 的地图的 UAs 和 F1s 偏差超过 25%。本研究的结论为今后的地图研究提供了明确的警示，有助于制作和识别真正优秀的地图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences

CiteScore

12.00

自引率

0.00%

发文量

审稿时长

77 days

期刊介绍： The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.