Sequential Outlier Hypothesis Testing Under Universality Constraints

IF 2.9 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Information Theory Pub Date : 2025-06-25 DOI:10.1109/TIT.2025.3583065

Jun Diao;Lin Zhou

{"title":"Sequential Outlier Hypothesis Testing Under Universality Constraints","authors":"Jun Diao;Lin Zhou","doi":"10.1109/TIT.2025.3583065","DOIUrl":null,"url":null,"abstract":"We revisit sequential outlier hypothesis testing and derive bounds on achievable exponents when both the nominal and anomalous distributions are <italic>unknown</i>. The task of outlier hypothesis testing is to identify the set of outliers that are generated from an anomalous distribution among all observed sequences where the rest majority are generated from a nominal distribution. In the sequential setting, one obtains a symbol from each sequence per unit time until a reliable decision could be made. For the case with exactly one outlier, our exponent bounds are tight, providing exact large deviations characterization of sequential tests and strengthening a previous result of Li et al. (2017). In particular, the average sample size of our sequential test is bounded universally under any pair of nominal and anomalous distributions and our sequential test achieves larger Bayesian exponent than the fixed-length test, which could not be guaranteed by the sequential test of Li et al. (2017). For the case with at most one outlier, we propose a threshold-based test that has bounded expected stopping time under mild conditions and we bound the exponential decay rate of error probabilities, a.k.a., error exponents, under each non-null hypothesis and the null hypothesis. Our sequential test resolves the tradeoff among the exponential decay rates of misclassification, false reject and false alarm probabilities for the fixed-length test of Zhou et al. (2022). Finally, with a further step towards practical applications, we generalize our results to the cases of multiple outliers and show that there is a penalty in the error exponents when the number of outliers is unknown.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"6602-6625"},"PeriodicalIF":2.9000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11051023/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

We revisit sequential outlier hypothesis testing and derive bounds on achievable exponents when both the nominal and anomalous distributions are unknown. The task of outlier hypothesis testing is to identify the set of outliers that are generated from an anomalous distribution among all observed sequences where the rest majority are generated from a nominal distribution. In the sequential setting, one obtains a symbol from each sequence per unit time until a reliable decision could be made. For the case with exactly one outlier, our exponent bounds are tight, providing exact large deviations characterization of sequential tests and strengthening a previous result of Li et al. (2017). In particular, the average sample size of our sequential test is bounded universally under any pair of nominal and anomalous distributions and our sequential test achieves larger Bayesian exponent than the fixed-length test, which could not be guaranteed by the sequential test of Li et al. (2017). For the case with at most one outlier, we propose a threshold-based test that has bounded expected stopping time under mild conditions and we bound the exponential decay rate of error probabilities, a.k.a., error exponents, under each non-null hypothesis and the null hypothesis. Our sequential test resolves the tradeoff among the exponential decay rates of misclassification, false reject and false alarm probabilities for the fixed-length test of Zhou et al. (2022). Finally, with a further step towards practical applications, we generalize our results to the cases of multiple outliers and show that there is a penalty in the error exponents when the number of outliers is unknown.

查看原文本刊更多论文

普遍性约束下的序贯离群假设检验

我们重新审视序列离群值假设检验，并在名义分布和异常分布都未知的情况下推导可实现指数的界限。异常值假设检验的任务是识别所有观测序列中由异常分布产生的异常值集，其中其余大部分来自名义分布。在顺序设置中，每单位时间从每个序列中获得一个符号，直到可以做出可靠的决定。对于只有一个异常值的情况，我们的指数界限很紧，提供了序列测试的精确大偏差特征，并加强了Li等人（2017）的先前结果。特别是，我们的序列检验的平均样本量在任何一对标称分布和异常分布下都是普遍有界的，我们的序列检验比定长检验获得了更大的贝叶斯指数，这是Li等人（2017）的序列检验无法保证的。对于最多只有一个异常值的情况，我们提出了一个基于阈值的测试，该测试在温和条件下有界的期望停止时间，并且我们在每个非零假设和零假设下约束了误差概率的指数衰减率，即误差指数。我们的顺序测试解决了周等人（2022）的定长测试的误分类、误拒和虚警概率的指数衰减率之间的权衡。最后，在进一步的实际应用中，我们将我们的结果推广到多个异常值的情况，并表明当异常值的数量未知时，误差指数会受到惩罚。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Theory 工程技术-工程：电子与电气

CiteScore

5.70

自引率

20.00%

发文量

514

审稿时长

12 months

期刊介绍： The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.