Robustness comparison of three statistical methods commonly used in Proficiency Tests

IF 1 4区工程技术 Q4 CHEMISTRY, ANALYTICAL

Accreditation and Quality Assurance Pub Date : 2025-07-24 DOI:10.1007/s00769-025-01669-3

Willem P. Cofino, Steven Crum, Winnie van Vark, Jaap Molenaar

{"title":"Robustness comparison of three statistical methods commonly used in Proficiency Tests","authors":"Willem P. Cofino, Steven Crum, Winnie van Vark, Jaap Molenaar","doi":"10.1007/s00769-025-01669-3","DOIUrl":null,"url":null,"abstract":"<div>This study compares the Algorithm A and Q/Hampel methods outlined in ISO 13528, and the NDA method utilized within the WEPAL/Quasimeme PT schemes, as far as robustness to outliers is concerned. The comparison starts with an analysis of Empirical Influence Functions. This analysis shows that NDA applies the strongest down-weighting to outliers, followed by Q/Hampel and Algorithm A, respectively. Second, we evaluated the methods using simulated datasets.. Stylized datasets using a normal distribution N(1,1) with 30 and 200 data were contaminated with 5%-45% data drawn from 32 different distributions. NDA consistently produced mean estimates closest to the true values, while Algorithm A showed the largest deviations. The percentage differences between the mean estimates of Q/Hampel and Algorithm A relative to NDA turned out to be linearly proportional to the L-skewness of the dataset within a substantial interval around L-skewness = 0. Third, the relationship between percentual differences between the mean estimates and L-skewness was analysed for over 33,000 datasets from WEPAL/Quasimeme. The linear relationships observed in the simulation study were reproduced. The percentual differences between the mean estimates were projected onto L-moment diagrams, stratified by sample size. Across the four classes discerned, the results exhibit consistent and interpretable patterns. The three methods showed similar robustness to tail weight (L-kurtosis), but NDA was markedly more robust to asymmetry, particularly in smaller samples. The three methods yield estimates that differ by less than 2% when L-skewness approaches zero. The findings demonstrate that NDA has a higher robustness than Q/Hampel and Algorithm A. NDA exhibits a lower efficiency (~ 78%) compared to Q/Hampel and Algorithm A (both ~ 96%). Our analysis clearly shows the robustness versus efficiency trade-off that is typical for this kind of statistical methods. We recommend PT organizers to assess this trade-off in the light of the distributional characteristics of their datasets and, if necessary, adapt the selection or parametrization of the statistical methodology. For the WEPAL-Quasimeme scheme, the emphasis put by the NDA method on robustness is an advantage given the characteristics of the datasets usually analysed in this scheme. In addition to mean estimation, our study also evaluates the behaviour of the standard deviation of the estimates generated by Qn, Q/Hampel, Algorithm A, and NDA. Qn and Q/Hampel produced similar results in datasets with larger sample sizes (N > 16). Q/Hampel produced higher estimates in smaller datasets. Algorithm A provided estimates consistent with the other methods for near-Gaussian datasets, but higher estimates for higher contamination levels. NDA’s standard deviation estimates were generally in agreement with those of Q/Hampel and Qn, but consistently lower for leptokurtic distributions. This behaviour is attributed to NDA’s down-weighting of outlying observations.</div>","PeriodicalId":454,"journal":{"name":"Accreditation and Quality Assurance","volume":"30 5","pages":"507 - 519"},"PeriodicalIF":1.0000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00769-025-01669-3.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accreditation and Quality Assurance","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1007/s00769-025-01669-3","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

Abstract

This study compares the Algorithm A and Q/Hampel methods outlined in ISO 13528, and the NDA method utilized within the WEPAL/Quasimeme PT schemes, as far as robustness to outliers is concerned. The comparison starts with an analysis of Empirical Influence Functions. This analysis shows that NDA applies the strongest down-weighting to outliers, followed by Q/Hampel and Algorithm A, respectively. Second, we evaluated the methods using simulated datasets.. Stylized datasets using a normal distribution N(1,1) with 30 and 200 data were contaminated with 5%-45% data drawn from 32 different distributions. NDA consistently produced mean estimates closest to the true values, while Algorithm A showed the largest deviations. The percentage differences between the mean estimates of Q/Hampel and Algorithm A relative to NDA turned out to be linearly proportional to the L-skewness of the dataset within a substantial interval around L-skewness = 0. Third, the relationship between percentual differences between the mean estimates and L-skewness was analysed for over 33,000 datasets from WEPAL/Quasimeme. The linear relationships observed in the simulation study were reproduced. The percentual differences between the mean estimates were projected onto L-moment diagrams, stratified by sample size. Across the four classes discerned, the results exhibit consistent and interpretable patterns. The three methods showed similar robustness to tail weight (L-kurtosis), but NDA was markedly more robust to asymmetry, particularly in smaller samples. The three methods yield estimates that differ by less than 2% when L-skewness approaches zero. The findings demonstrate that NDA has a higher robustness than Q/Hampel and Algorithm A. NDA exhibits a lower efficiency (~ 78%) compared to Q/Hampel and Algorithm A (both ~ 96%). Our analysis clearly shows the robustness versus efficiency trade-off that is typical for this kind of statistical methods. We recommend PT organizers to assess this trade-off in the light of the distributional characteristics of their datasets and, if necessary, adapt the selection or parametrization of the statistical methodology. For the WEPAL-Quasimeme scheme, the emphasis put by the NDA method on robustness is an advantage given the characteristics of the datasets usually analysed in this scheme. In addition to mean estimation, our study also evaluates the behaviour of the standard deviation of the estimates generated by Q_n, Q/Hampel, Algorithm A, and NDA. Q_n and Q/Hampel produced similar results in datasets with larger sample sizes (N > 16). Q/Hampel produced higher estimates in smaller datasets. Algorithm A provided estimates consistent with the other methods for near-Gaussian datasets, but higher estimates for higher contamination levels. NDA’s standard deviation estimates were generally in agreement with those of Q/Hampel and Q_n, but consistently lower for leptokurtic distributions. This behaviour is attributed to NDA’s down-weighting of outlying observations.

查看原文本刊更多论文

能力测试常用三种统计方法的稳健性比较

本研究比较了ISO 13528中概述的算法A和Q/Hampel方法，以及在WEPAL/Quasimeme PT方案中使用的NDA方法，就对异常值的鲁棒性而言。比较首先是对经验影响函数的分析。分析表明，NDA对异常值的降权作用最强，其次是Q/Hampel和算法A。其次，我们使用模拟数据集评估了这些方法。使用正态分布N(1,1)，包含30和200个数据的程式化数据集受到来自32个不同分布的5%-45%数据的污染。NDA始终产生最接近真实值的均值估计，而算法A显示的偏差最大。Q/Hampel和算法A相对于NDA的平均估计值之间的百分比差异在L-skewness = 0周围的实质性区间内与数据集的L-skewness成线性比例。第三，对来自WEPAL/Quasimeme的33,000多个数据集的均值估计和l -偏度之间的百分比差异进行了分析。模拟研究中观察到的线性关系得到了再现。平均估计之间的百分比差异被投影到l矩图上，按样本量分层。在识别的四个类中，结果显示一致且可解释的模式。这三种方法对尾重（l -峰度）的稳健性相似，但NDA对不对称性的稳健性更强，特别是在较小的样本中。当l -偏度接近零时，这三种方法产生的估计值差异小于2%。结果表明，NDA比Q/Hampel和算法a具有更高的鲁棒性。NDA的效率（~ 78%）低于Q/Hampel和算法a（均为~ 96%）。我们的分析清楚地显示了稳健性与效率之间的权衡，这是这种统计方法的典型特点。我们建议PT组织者根据其数据集的分布特征评估这种权衡，并在必要时调整统计方法的选择或参数化。对于WEPAL-Quasimeme方案，考虑到该方案中通常分析的数据集的特点，NDA方法对鲁棒性的强调是一个优势。除了均值估计，我们的研究还评估了由Qn、Q/Hampel、算法A和NDA生成的估计的标准差行为。Qn和Q/Hampel在样本量较大的数据集上得出了类似的结果（N > 16）。Q/Hampel在较小的数据集中得出了更高的估计值。对于近高斯数据集，算法A提供的估计值与其他方法一致，但污染程度越高，估计值越高。NDA的标准差估计总体上与Q/Hampel和Qn的估计一致，但对于细峰分布始终较低。这种行为归因于NDA降低了外围观测值的权重。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Accreditation and Quality Assurance 工程技术-分析化学

CiteScore

1.80

自引率

22.20%

发文量

审稿时长

6-12 weeks

期刊介绍： Accreditation and Quality Assurance has established itself as the leading information and discussion forum for all aspects relevant to quality, transparency and reliability of measurement results in chemical and biological sciences. The journal serves the information needs of researchers, practitioners and decision makers dealing with quality assurance and quality management, including the development and application of metrological principles and concepts such as traceability or measurement uncertainty in the following fields: environment, nutrition, consumer protection, geology, metallurgy, pharmacy, forensics, clinical chemistry and laboratory medicine, and microbiology.