Willem P. Cofino, Steven Crum, Winnie van Vark, Jaap Molenaar
{"title":"能力测试常用三种统计方法的稳健性比较","authors":"Willem P. Cofino, Steven Crum, Winnie van Vark, Jaap Molenaar","doi":"10.1007/s00769-025-01669-3","DOIUrl":null,"url":null,"abstract":"<div><p>This study compares the Algorithm A and Q/Hampel methods outlined in ISO 13528, and the NDA method utilized within the WEPAL/Quasimeme PT schemes, as far as robustness to outliers is concerned. The comparison starts with an analysis of Empirical Influence Functions. This analysis shows that NDA applies the strongest down-weighting to outliers, followed by Q/Hampel and Algorithm A, respectively. Second, we evaluated the methods using simulated datasets.. Stylized datasets using a normal distribution N(1,1) with 30 and 200 data were contaminated with 5%-45% data drawn from 32 different distributions. NDA consistently produced mean estimates closest to the true values, while Algorithm A showed the largest deviations. The percentage differences between the mean estimates of Q/Hampel and Algorithm A relative to NDA turned out to be linearly proportional to the L-skewness of the dataset within a substantial interval around L-skewness = 0. Third, the relationship between percentual differences between the mean estimates and L-skewness was analysed for over 33,000 datasets from WEPAL/Quasimeme. The linear relationships observed in the simulation study were reproduced. The percentual differences between the mean estimates were projected onto L-moment diagrams, stratified by sample size. Across the four classes discerned, the results exhibit consistent and interpretable patterns. The three methods showed similar robustness to tail weight (L-kurtosis), but NDA was markedly more robust to asymmetry, particularly in smaller samples. The three methods yield estimates that differ by less than 2% when L-skewness approaches zero. The findings demonstrate that NDA has a higher robustness than Q/Hampel and Algorithm A. NDA exhibits a lower efficiency (~ 78%) compared to Q/Hampel and Algorithm A (both ~ 96%). Our analysis clearly shows the robustness versus efficiency trade-off that is typical for this kind of statistical methods. We recommend PT organizers to assess this trade-off in the light of the distributional characteristics of their datasets and, if necessary, adapt the selection or parametrization of the statistical methodology. For the WEPAL-Quasimeme scheme, the emphasis put by the NDA method on robustness is an advantage given the characteristics of the datasets usually analysed in this scheme. In addition to mean estimation, our study also evaluates the behaviour of the standard deviation of the estimates generated by Q<sub>n</sub>, Q/Hampel, Algorithm A, and NDA. Q<sub>n</sub> and Q/Hampel produced similar results in datasets with larger sample sizes (<i>N</i> > 16). Q/Hampel produced higher estimates in smaller datasets. Algorithm A provided estimates consistent with the other methods for near-Gaussian datasets, but higher estimates for higher contamination levels. NDA’s standard deviation estimates were generally in agreement with those of Q/Hampel and Q<sub>n</sub>, but consistently lower for leptokurtic distributions. This behaviour is attributed to NDA’s down-weighting of outlying observations.</p></div>","PeriodicalId":454,"journal":{"name":"Accreditation and Quality Assurance","volume":"30 5","pages":"507 - 519"},"PeriodicalIF":1.0000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s00769-025-01669-3.pdf","citationCount":"0","resultStr":"{\"title\":\"Robustness comparison of three statistical methods commonly used in Proficiency Tests\",\"authors\":\"Willem P. Cofino, Steven Crum, Winnie van Vark, Jaap Molenaar\",\"doi\":\"10.1007/s00769-025-01669-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This study compares the Algorithm A and Q/Hampel methods outlined in ISO 13528, and the NDA method utilized within the WEPAL/Quasimeme PT schemes, as far as robustness to outliers is concerned. The comparison starts with an analysis of Empirical Influence Functions. This analysis shows that NDA applies the strongest down-weighting to outliers, followed by Q/Hampel and Algorithm A, respectively. Second, we evaluated the methods using simulated datasets.. Stylized datasets using a normal distribution N(1,1) with 30 and 200 data were contaminated with 5%-45% data drawn from 32 different distributions. NDA consistently produced mean estimates closest to the true values, while Algorithm A showed the largest deviations. The percentage differences between the mean estimates of Q/Hampel and Algorithm A relative to NDA turned out to be linearly proportional to the L-skewness of the dataset within a substantial interval around L-skewness = 0. Third, the relationship between percentual differences between the mean estimates and L-skewness was analysed for over 33,000 datasets from WEPAL/Quasimeme. The linear relationships observed in the simulation study were reproduced. The percentual differences between the mean estimates were projected onto L-moment diagrams, stratified by sample size. Across the four classes discerned, the results exhibit consistent and interpretable patterns. The three methods showed similar robustness to tail weight (L-kurtosis), but NDA was markedly more robust to asymmetry, particularly in smaller samples. The three methods yield estimates that differ by less than 2% when L-skewness approaches zero. The findings demonstrate that NDA has a higher robustness than Q/Hampel and Algorithm A. NDA exhibits a lower efficiency (~ 78%) compared to Q/Hampel and Algorithm A (both ~ 96%). Our analysis clearly shows the robustness versus efficiency trade-off that is typical for this kind of statistical methods. We recommend PT organizers to assess this trade-off in the light of the distributional characteristics of their datasets and, if necessary, adapt the selection or parametrization of the statistical methodology. For the WEPAL-Quasimeme scheme, the emphasis put by the NDA method on robustness is an advantage given the characteristics of the datasets usually analysed in this scheme. In addition to mean estimation, our study also evaluates the behaviour of the standard deviation of the estimates generated by Q<sub>n</sub>, Q/Hampel, Algorithm A, and NDA. Q<sub>n</sub> and Q/Hampel produced similar results in datasets with larger sample sizes (<i>N</i> > 16). Q/Hampel produced higher estimates in smaller datasets. Algorithm A provided estimates consistent with the other methods for near-Gaussian datasets, but higher estimates for higher contamination levels. NDA’s standard deviation estimates were generally in agreement with those of Q/Hampel and Q<sub>n</sub>, but consistently lower for leptokurtic distributions. This behaviour is attributed to NDA’s down-weighting of outlying observations.</p></div>\",\"PeriodicalId\":454,\"journal\":{\"name\":\"Accreditation and Quality Assurance\",\"volume\":\"30 5\",\"pages\":\"507 - 519\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s00769-025-01669-3.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accreditation and Quality Assurance\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00769-025-01669-3\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accreditation and Quality Assurance","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1007/s00769-025-01669-3","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
Robustness comparison of three statistical methods commonly used in Proficiency Tests
This study compares the Algorithm A and Q/Hampel methods outlined in ISO 13528, and the NDA method utilized within the WEPAL/Quasimeme PT schemes, as far as robustness to outliers is concerned. The comparison starts with an analysis of Empirical Influence Functions. This analysis shows that NDA applies the strongest down-weighting to outliers, followed by Q/Hampel and Algorithm A, respectively. Second, we evaluated the methods using simulated datasets.. Stylized datasets using a normal distribution N(1,1) with 30 and 200 data were contaminated with 5%-45% data drawn from 32 different distributions. NDA consistently produced mean estimates closest to the true values, while Algorithm A showed the largest deviations. The percentage differences between the mean estimates of Q/Hampel and Algorithm A relative to NDA turned out to be linearly proportional to the L-skewness of the dataset within a substantial interval around L-skewness = 0. Third, the relationship between percentual differences between the mean estimates and L-skewness was analysed for over 33,000 datasets from WEPAL/Quasimeme. The linear relationships observed in the simulation study were reproduced. The percentual differences between the mean estimates were projected onto L-moment diagrams, stratified by sample size. Across the four classes discerned, the results exhibit consistent and interpretable patterns. The three methods showed similar robustness to tail weight (L-kurtosis), but NDA was markedly more robust to asymmetry, particularly in smaller samples. The three methods yield estimates that differ by less than 2% when L-skewness approaches zero. The findings demonstrate that NDA has a higher robustness than Q/Hampel and Algorithm A. NDA exhibits a lower efficiency (~ 78%) compared to Q/Hampel and Algorithm A (both ~ 96%). Our analysis clearly shows the robustness versus efficiency trade-off that is typical for this kind of statistical methods. We recommend PT organizers to assess this trade-off in the light of the distributional characteristics of their datasets and, if necessary, adapt the selection or parametrization of the statistical methodology. For the WEPAL-Quasimeme scheme, the emphasis put by the NDA method on robustness is an advantage given the characteristics of the datasets usually analysed in this scheme. In addition to mean estimation, our study also evaluates the behaviour of the standard deviation of the estimates generated by Qn, Q/Hampel, Algorithm A, and NDA. Qn and Q/Hampel produced similar results in datasets with larger sample sizes (N > 16). Q/Hampel produced higher estimates in smaller datasets. Algorithm A provided estimates consistent with the other methods for near-Gaussian datasets, but higher estimates for higher contamination levels. NDA’s standard deviation estimates were generally in agreement with those of Q/Hampel and Qn, but consistently lower for leptokurtic distributions. This behaviour is attributed to NDA’s down-weighting of outlying observations.
期刊介绍:
Accreditation and Quality Assurance has established itself as the leading information and discussion forum for all aspects relevant to quality, transparency and reliability of measurement results in chemical and biological sciences. The journal serves the information needs of researchers, practitioners and decision makers dealing with quality assurance and quality management, including the development and application of metrological principles and concepts such as traceability or measurement uncertainty in the following fields: environment, nutrition, consumer protection, geology, metallurgy, pharmacy, forensics, clinical chemistry and laboratory medicine, and microbiology.