Maximum Likelihood Estimation of Optimal Receiver Operating Characteristic Curves From Likelihood Ratio Observations

IF 2.9 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Information Theory Pub Date : 2025-08-04 DOI:10.1109/TIT.2025.3595488

Bruce Hajek;Xiaohan Kang

{"title":"Maximum Likelihood Estimation of Optimal Receiver Operating Characteristic Curves From Likelihood Ratio Observations","authors":"Bruce Hajek;Xiaohan Kang","doi":"10.1109/TIT.2025.3595488","DOIUrl":null,"url":null,"abstract":"The optimal receiver operating characteristic (ROC) curve, giving the maximum probability of detection as a function of the probability of false alarm, is a key information-theoretic indicator of the difficulty of a binary hypothesis testing problem (BHT). It is well known that the optimal ROC curve for a given BHT, corresponding to the likelihood ratio test, is determined by the probability distribution of the observed data under each of the two hypotheses. In some cases, these two distributions may be unknown or computationally intractable, but independent samples of the likelihood ratio can be observed. This raises the problem of estimating the optimal ROC for a BHT from such samples. The maximum likelihood estimator of the optimal ROC curve is derived, and it is shown to converge almost surely to the true optimal ROC curve in the Lévy metric, as the number of observations tends to infinity. Finite sample size bounds are obtained for three other estimators: the classical empirical estimator, based on estimating the two types of error probabilities from two separate sets of samples, and two variations of the maximum likelihood estimator called the split estimator and fused estimator, respectively. The maximum likelihood estimator is observed in simulation experiments to be considerably more accurate than the empirical estimator, especially when the number of samples obtained under one of the two hypotheses is small. The area under the maximum likelihood estimator is derived; it is a consistent estimator of the area under the true optimal ROC curve.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 10","pages":"7568-7584"},"PeriodicalIF":2.9000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11111686","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11111686/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The optimal receiver operating characteristic (ROC) curve, giving the maximum probability of detection as a function of the probability of false alarm, is a key information-theoretic indicator of the difficulty of a binary hypothesis testing problem (BHT). It is well known that the optimal ROC curve for a given BHT, corresponding to the likelihood ratio test, is determined by the probability distribution of the observed data under each of the two hypotheses. In some cases, these two distributions may be unknown or computationally intractable, but independent samples of the likelihood ratio can be observed. This raises the problem of estimating the optimal ROC for a BHT from such samples. The maximum likelihood estimator of the optimal ROC curve is derived, and it is shown to converge almost surely to the true optimal ROC curve in the Lévy metric, as the number of observations tends to infinity. Finite sample size bounds are obtained for three other estimators: the classical empirical estimator, based on estimating the two types of error probabilities from two separate sets of samples, and two variations of the maximum likelihood estimator called the split estimator and fused estimator, respectively. The maximum likelihood estimator is observed in simulation experiments to be considerably more accurate than the empirical estimator, especially when the number of samples obtained under one of the two hypotheses is small. The area under the maximum likelihood estimator is derived; it is a consistent estimator of the area under the true optimal ROC curve.

查看原文本刊更多论文

基于似然比观测的最佳接收者工作特性曲线的最大似然估计

最佳接收者工作特征（ROC）曲线给出最大检测概率作为虚警概率的函数，是二元假设检验问题（BHT）难易程度的关键信息论指标。众所周知，对于给定BHT，对应似然比检验的最优ROC曲线是由两种假设下观测数据的概率分布决定的。在某些情况下，这两种分布可能是未知的或计算上难以处理的，但可以观察到似然比的独立样本。这就提出了从这些样本中估计BHT的最佳ROC的问题。导出了最优ROC曲线的最大似然估计量，并证明它几乎肯定地收敛于lsamvy度量中的真正最优ROC曲线，因为观测值的数量趋于无穷。对于另外三种估计量，得到了有限的样本量边界：经典经验估计量，基于对两种不同样本集的两种误差概率的估计，以及最大似然估计量的两种变体，分别称为分裂估计量和融合估计量。在模拟实验中观察到，极大似然估计量比经验估计量要精确得多，特别是当在两个假设之一下获得的样本数量很小时。导出了极大似然估计下的面积；它是真正最优ROC曲线下面积的一致估计量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Theory 工程技术-工程：电子与电气

CiteScore

5.70

自引率

20.00%

发文量

514

审稿时长

12 months

期刊介绍： The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.