A statistical review: why average weighted accuracy, not accuracy or AUC?

Q3 Medicine

Biostatistics and Epidemiology Pub Date : 2021-07-03 DOI:10.1080/24709360.2021.1975255

Yunyun Jiang, Q. Pan, Ying Liu, S. Evans

{"title":"A statistical review: why average weighted accuracy, not accuracy or AUC?","authors":"Yunyun Jiang, Q. Pan, Ying Liu, S. Evans","doi":"10.1080/24709360.2021.1975255","DOIUrl":null,"url":null,"abstract":"Sensitivity and specificity are key aspects in evaluating the performance of diagnostic tests. Accuracy and AUC are commonly used composite measures that incorporate sensitivity and specificity. Average Weighted Accuracy (AWA) is motivated by the need for a statistical measure that compares diagnostic tests from the medical costs and clinical impact point of view, while incorporating the relevant prevalence range of the disease as well as the relative importance of false-positive versus false-negative cases. We illustrate the testing procedures in four different scenarios: (i) one diagnostic test vs. the best random test, (ii) two diagnostic tests from two independent samples, (iii) two diagnostic tests from the same sample, and (iv) more than two diagnostic tests from different or the same samples. The impacts of sample size, prevalence, and relative importance on power and average medical costs/clinical loss are examined through simulation studies. Accuracy has the highest power while AWA provides a consistent criterion in selecting the optimal threshold and better diagnostic tests with direct clinical interpretations. The use of AWA is illustrated on a three-arm clinical trial evaluating three different assays in detecting Neisseria gonorrhoeae and Chlamydia trachomatis in the rectum and pharynx.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"5 1","pages":"267 - 286"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics and Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24709360.2021.1975255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 1

Abstract

Sensitivity and specificity are key aspects in evaluating the performance of diagnostic tests. Accuracy and AUC are commonly used composite measures that incorporate sensitivity and specificity. Average Weighted Accuracy (AWA) is motivated by the need for a statistical measure that compares diagnostic tests from the medical costs and clinical impact point of view, while incorporating the relevant prevalence range of the disease as well as the relative importance of false-positive versus false-negative cases. We illustrate the testing procedures in four different scenarios: (i) one diagnostic test vs. the best random test, (ii) two diagnostic tests from two independent samples, (iii) two diagnostic tests from the same sample, and (iv) more than two diagnostic tests from different or the same samples. The impacts of sample size, prevalence, and relative importance on power and average medical costs/clinical loss are examined through simulation studies. Accuracy has the highest power while AWA provides a consistent criterion in selecting the optimal threshold and better diagnostic tests with direct clinical interpretations. The use of AWA is illustrated on a three-arm clinical trial evaluating three different assays in detecting Neisseria gonorrhoeae and Chlamydia trachomatis in the rectum and pharynx.

查看原文本刊更多论文

统计综述：为什么平均加权准确度，而不是准确度或AUC？

敏感性和特异性是评估诊断测试性能的关键方面。准确度和AUC是通常使用的综合指标，包括敏感性和特异性。平均加权准确度（AWA）的动机是需要一种统计测量方法，从医疗成本和临床影响的角度对诊断测试进行比较，同时考虑疾病的相关流行范围以及假阳性与假阴性病例的相对重要性。我们说明了四种不同情况下的测试程序：（i）一次诊断测试与最佳随机测试，（ii）来自两个独立样本的两次诊断测试，（iii）来自同一样本的两项诊断测试，以及（iv）来自不同或相同样本的两个以上诊断测试。通过模拟研究检验了样本量、患病率和相对重要性对功率和平均医疗成本/临床损失的影响。准确度最高，而AWA在选择最佳阈值和更好的诊断测试时提供了一致的标准，并具有直接的临床解释。AWA的使用在一项三组临床试验中得到了说明，该试验评估了在直肠和咽部检测淋球菌和沙眼衣原体的三种不同检测方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biostatistics and Epidemiology Medicine-Health Informatics

CiteScore

1.80

自引率

0.00%

发文量