{"title":"An extended McNemar test for comparing correlated proportion of positive responses","authors":"Okeh Uchechukwu Marius, Obiora-Ilouno Happiness","doi":"10.15406/bbij.2019.08.00281","DOIUrl":null,"url":null,"abstract":"The receiver operating characteristic (ROC) curve is a standard tool used to evaluate the performance of a diagnostic test when measurement of test results are either continuous or ordinal.1 In 1950s the methodology of ROC was first developed by electrical and radar engineers during World War II for signal detection theory in battle fields.2 In an ROC curve, the true positive rate (TPR) is plotted against the false positive rate (FPR) across all possible cut-off values in other to make meaningful decision. The area under the ROC curve (AUC) is a summary index for measuring the diagnostic accuracy. AUC ranges from 0 to 1 inclusive and the greater the value of AUC close to 1, the better the discriminatory power of the diagnostic procedure. Often times, the aim of many diagnostic studies is to compare the accuracy of diagnostic tests to determine the superiority of one test over another test for a certain condition or disease when data measurement may be on any scale. Statistical inference may be based on parametric, nonparametric or semi-parametric statistics. If the statistical inference is nonparametric, the difference between correlated AUCs for paired data was first proposed by DeLong et al.,3 and it is based upon asymptotic theory for U-statistics.4 But the validity of this or any other method relays on large sample size and when the sample size is small, the validity of the test for the difference between two or more AUCs may not be achieved. Two permutation tests for paired receiver operating characteristic (ROC) studies currently exist: one proposed by Venkatraman & Begg5 and the more recent test of Bandos et al.,6 The test of Bandos et al.,6 directly tests for an equality of AUCs, while the test of Venkatraman & Begg5 is more general and tests for equality of the underlying ROC curves. As a result, the test of Venkatraman & Begg5 is less powerful for testing equality of AUCs. Both permutation tests are executed by permuting the labels of the two tests within each diseased and non-diseased subject. Such an approach implicitly assumes that both tests are exchangeable within subject and requires an appropriate transformation, such as ranks, for tests differing in scale. Bandos et al.,6 compared the performance of their test to that of DeLong et al.,3 using simulation and found that the permutation test had greater power than the nonparametric test developed by DeLong et al.,3 when there was moderate correlation between two tests, large AUCs, and small sample sizes.","PeriodicalId":90455,"journal":{"name":"Biometrics & biostatistics international journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics & biostatistics international journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15406/bbij.2019.08.00281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The receiver operating characteristic (ROC) curve is a standard tool used to evaluate the performance of a diagnostic test when measurement of test results are either continuous or ordinal.1 In 1950s the methodology of ROC was first developed by electrical and radar engineers during World War II for signal detection theory in battle fields.2 In an ROC curve, the true positive rate (TPR) is plotted against the false positive rate (FPR) across all possible cut-off values in other to make meaningful decision. The area under the ROC curve (AUC) is a summary index for measuring the diagnostic accuracy. AUC ranges from 0 to 1 inclusive and the greater the value of AUC close to 1, the better the discriminatory power of the diagnostic procedure. Often times, the aim of many diagnostic studies is to compare the accuracy of diagnostic tests to determine the superiority of one test over another test for a certain condition or disease when data measurement may be on any scale. Statistical inference may be based on parametric, nonparametric or semi-parametric statistics. If the statistical inference is nonparametric, the difference between correlated AUCs for paired data was first proposed by DeLong et al.,3 and it is based upon asymptotic theory for U-statistics.4 But the validity of this or any other method relays on large sample size and when the sample size is small, the validity of the test for the difference between two or more AUCs may not be achieved. Two permutation tests for paired receiver operating characteristic (ROC) studies currently exist: one proposed by Venkatraman & Begg5 and the more recent test of Bandos et al.,6 The test of Bandos et al.,6 directly tests for an equality of AUCs, while the test of Venkatraman & Begg5 is more general and tests for equality of the underlying ROC curves. As a result, the test of Venkatraman & Begg5 is less powerful for testing equality of AUCs. Both permutation tests are executed by permuting the labels of the two tests within each diseased and non-diseased subject. Such an approach implicitly assumes that both tests are exchangeable within subject and requires an appropriate transformation, such as ranks, for tests differing in scale. Bandos et al.,6 compared the performance of their test to that of DeLong et al.,3 using simulation and found that the permutation test had greater power than the nonparametric test developed by DeLong et al.,3 when there was moderate correlation between two tests, large AUCs, and small sample sizes.