An extended McNemar test for comparing correlated proportion of positive responses

Biometrics & biostatistics international journal Pub Date : 2019-07-08 DOI:10.15406/bbij.2019.08.00281

Okeh Uchechukwu Marius, Obiora-Ilouno Happiness

{"title":"An extended McNemar test for comparing correlated proportion of positive responses","authors":"Okeh Uchechukwu Marius, Obiora-Ilouno Happiness","doi":"10.15406/bbij.2019.08.00281","DOIUrl":null,"url":null,"abstract":"The receiver operating characteristic (ROC) curve is a standard tool used to evaluate the performance of a diagnostic test when measurement of test results are either continuous or ordinal.1 In 1950s the methodology of ROC was first developed by electrical and radar engineers during World War II for signal detection theory in battle fields.2 In an ROC curve, the true positive rate (TPR) is plotted against the false positive rate (FPR) across all possible cut-off values in other to make meaningful decision. The area under the ROC curve (AUC) is a summary index for measuring the diagnostic accuracy. AUC ranges from 0 to 1 inclusive and the greater the value of AUC close to 1, the better the discriminatory power of the diagnostic procedure. Often times, the aim of many diagnostic studies is to compare the accuracy of diagnostic tests to determine the superiority of one test over another test for a certain condition or disease when data measurement may be on any scale. Statistical inference may be based on parametric, nonparametric or semi-parametric statistics. If the statistical inference is nonparametric, the difference between correlated AUCs for paired data was first proposed by DeLong et al.,3 and it is based upon asymptotic theory for U-statistics.4 But the validity of this or any other method relays on large sample size and when the sample size is small, the validity of the test for the difference between two or more AUCs may not be achieved. Two permutation tests for paired receiver operating characteristic (ROC) studies currently exist: one proposed by Venkatraman & Begg5 and the more recent test of Bandos et al.,6 The test of Bandos et al.,6 directly tests for an equality of AUCs, while the test of Venkatraman & Begg5 is more general and tests for equality of the underlying ROC curves. As a result, the test of Venkatraman & Begg5 is less powerful for testing equality of AUCs. Both permutation tests are executed by permuting the labels of the two tests within each diseased and non-diseased subject. Such an approach implicitly assumes that both tests are exchangeable within subject and requires an appropriate transformation, such as ranks, for tests differing in scale. Bandos et al.,6 compared the performance of their test to that of DeLong et al.,3 using simulation and found that the permutation test had greater power than the nonparametric test developed by DeLong et al.,3 when there was moderate correlation between two tests, large AUCs, and small sample sizes.","PeriodicalId":90455,"journal":{"name":"Biometrics & biostatistics international journal","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics & biostatistics international journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15406/bbij.2019.08.00281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The receiver operating characteristic (ROC) curve is a standard tool used to evaluate the performance of a diagnostic test when measurement of test results are either continuous or ordinal.1 In 1950s the methodology of ROC was first developed by electrical and radar engineers during World War II for signal detection theory in battle fields.2 In an ROC curve, the true positive rate (TPR) is plotted against the false positive rate (FPR) across all possible cut-off values in other to make meaningful decision. The area under the ROC curve (AUC) is a summary index for measuring the diagnostic accuracy. AUC ranges from 0 to 1 inclusive and the greater the value of AUC close to 1, the better the discriminatory power of the diagnostic procedure. Often times, the aim of many diagnostic studies is to compare the accuracy of diagnostic tests to determine the superiority of one test over another test for a certain condition or disease when data measurement may be on any scale. Statistical inference may be based on parametric, nonparametric or semi-parametric statistics. If the statistical inference is nonparametric, the difference between correlated AUCs for paired data was first proposed by DeLong et al.,3 and it is based upon asymptotic theory for U-statistics.4 But the validity of this or any other method relays on large sample size and when the sample size is small, the validity of the test for the difference between two or more AUCs may not be achieved. Two permutation tests for paired receiver operating characteristic (ROC) studies currently exist: one proposed by Venkatraman & Begg5 and the more recent test of Bandos et al.,6 The test of Bandos et al.,6 directly tests for an equality of AUCs, while the test of Venkatraman & Begg5 is more general and tests for equality of the underlying ROC curves. As a result, the test of Venkatraman & Begg5 is less powerful for testing equality of AUCs. Both permutation tests are executed by permuting the labels of the two tests within each diseased and non-diseased subject. Such an approach implicitly assumes that both tests are exchangeable within subject and requires an appropriate transformation, such as ranks, for tests differing in scale. Bandos et al.,6 compared the performance of their test to that of DeLong et al.,3 using simulation and found that the permutation test had greater power than the nonparametric test developed by DeLong et al.,3 when there was moderate correlation between two tests, large AUCs, and small sample sizes.

查看原文本刊更多论文

比较积极反应相关比例的扩展McNemar检验

受试者工作特征(ROC)曲线是一个标准的工具，用于评估诊断测试的性能，当测试结果的测量是连续的或顺序的在20世纪50年代，ROC方法首先由电气和雷达工程师在第二次世界大战期间开发，用于战场上的信号检测理论在ROC曲线中，在其他所有可能的截止值上绘制真阳性率(TPR)与假阳性率(FPR)，以做出有意义的决策。ROC曲线下面积(AUC)是衡量诊断准确性的总结性指标。AUC取值范围为0 ~ 1，且AUC值越接近1，诊断程序的鉴别能力越强。通常情况下，许多诊断研究的目的是比较诊断测试的准确性，以确定一种测试比另一种测试对某种情况或疾病的优越性，当数据测量可以在任何尺度上。统计推断可以基于参数统计、非参数统计或半参数统计。如果统计推断是非参数的，配对数据的相关auc之间的差异首先由DeLong等人提出3，它是基于u统计的渐近理论但是，这种方法或任何其他方法的有效性依赖于大样本量，当样本量较小时，可能无法实现对两个或多个auc之间差异的检验的有效性。配对受试者工作特征(ROC)研究目前存在两种排列检验:一种是由Venkatraman & beg5提出的，另一种是Bandos等人最近的检验，6 . Bandos等人的检验，6 .直接检验auc的相等性，而Venkatraman & beg5的检验更一般，检验潜在ROC曲线的相等性。因此，Venkatraman & beg5的检验在检验auc的相等性方面效力较弱。这两种排列测试都是通过在每个患病和非患病受试者中排列两种测试的标签来执行的。这种方法隐含地假设这两种测试在主题内是可交换的，并且需要对不同规模的测试进行适当的转换，例如等级。Bandos et al.，6使用模拟将他们的测试性能与DeLong et al.，3进行了比较，发现当两个测试之间存在中等相关性，auc较大，样本量较小时，排列测试比DeLong et al. 3开发的非参数测试具有更大的功效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biometrics & biostatistics international journal

自引率

0.00%

发文量