离散环境中的一般自适应阈值监测和不平衡类规则

IF 1.4 3区 数学 Q2 STATISTICS & PROBABILITY
Ansgar Steland, Ewaryst Rafajłowicz, Wojciech Rafajłowicz
{"title":"离散环境中的一般自适应阈值监测和不平衡类规则","authors":"Ansgar Steland, Ewaryst Rafajłowicz, Wojciech Rafajłowicz","doi":"10.1111/stan.12352","DOIUrl":null,"url":null,"abstract":"Having in mind applications in statistics and machine learning such as individualized care monitoring, or watermark detection in large language models, we consider the following general setting: When monitoring a sequence of observations, , there may be additional information, , on the environment which should be used to design the monitoring procedure. This additional information can be incorporated by applying threshold functions to the standardized measurements to adapt the detector to the environment. For the case of categorical data encoding of discrete‐valued environmental information we study several classes of level threshold functions including a proportional one which favors rare events among imbalanced classes. For the latter rule asymptotic theory is developed for independent and identically distributed and dependent learning samples including data from new discrete autoregressive moving average model (NDARMA) series and Hidden Markov Models. Further, we propose two‐stage designs which allow to distribute in a controlled way the budget over an a priori partition of the sample space of . The approach is illustrated by a real medical data set.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"7 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"General adapted‐threshold monitoring in discrete environments and rules for imbalanced classes\",\"authors\":\"Ansgar Steland, Ewaryst Rafajłowicz, Wojciech Rafajłowicz\",\"doi\":\"10.1111/stan.12352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Having in mind applications in statistics and machine learning such as individualized care monitoring, or watermark detection in large language models, we consider the following general setting: When monitoring a sequence of observations, , there may be additional information, , on the environment which should be used to design the monitoring procedure. This additional information can be incorporated by applying threshold functions to the standardized measurements to adapt the detector to the environment. For the case of categorical data encoding of discrete‐valued environmental information we study several classes of level threshold functions including a proportional one which favors rare events among imbalanced classes. For the latter rule asymptotic theory is developed for independent and identically distributed and dependent learning samples including data from new discrete autoregressive moving average model (NDARMA) series and Hidden Markov Models. Further, we propose two‐stage designs which allow to distribute in a controlled way the budget over an a priori partition of the sample space of . The approach is illustrated by a real medical data set.\",\"PeriodicalId\":51178,\"journal\":{\"name\":\"Statistica Neerlandica\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistica Neerlandica\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1111/stan.12352\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Neerlandica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/stan.12352","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

考虑到统计学和机器学习中的应用,如个性化护理监控或大型语言模型中的水印检测,我们考虑了以下一般情况:在监控一系列观察结果时,可能会有关于环境的附加信息,这些信息应被用于设计监控程序。可以通过对标准化测量应用阈值函数来纳入这些附加信息,从而使检测器适应环境。对于离散值环境信息的分类数据编码情况,我们研究了几类水平阈值函数,包括在不平衡类别中偏好罕见事件的比例函数。对于后一种规则,我们开发了独立同分布和依赖学习样本的渐近理论,包括新离散自回归移动平均模型(NDARMA)序列和隐马尔可夫模型的数据。此外,我们还提出了两阶段设计方案,允许在样本空间的先验分区上以可控方式分配预算。 我们通过一个真实的医疗数据集对该方法进行了说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
General adapted‐threshold monitoring in discrete environments and rules for imbalanced classes
Having in mind applications in statistics and machine learning such as individualized care monitoring, or watermark detection in large language models, we consider the following general setting: When monitoring a sequence of observations, , there may be additional information, , on the environment which should be used to design the monitoring procedure. This additional information can be incorporated by applying threshold functions to the standardized measurements to adapt the detector to the environment. For the case of categorical data encoding of discrete‐valued environmental information we study several classes of level threshold functions including a proportional one which favors rare events among imbalanced classes. For the latter rule asymptotic theory is developed for independent and identically distributed and dependent learning samples including data from new discrete autoregressive moving average model (NDARMA) series and Hidden Markov Models. Further, we propose two‐stage designs which allow to distribute in a controlled way the budget over an a priori partition of the sample space of . The approach is illustrated by a real medical data set.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Statistica Neerlandica
Statistica Neerlandica 数学-统计学与概率论
CiteScore
2.60
自引率
6.70%
发文量
26
审稿时长
>12 weeks
期刊介绍: Statistica Neerlandica has been the journal of the Netherlands Society for Statistics and Operations Research since 1946. It covers all areas of statistics, from theoretical to applied, with a special emphasis on mathematical statistics, statistics for the behavioural sciences and biostatistics. This wide scope is reflected by the expertise of the journal’s editors representing these areas. The diverse editorial board is committed to a fast and fair reviewing process, and will judge submissions on quality, correctness, relevance and originality. Statistica Neerlandica encourages transparency and reproducibility, and offers online resources to make data, code, simulation results and other additional materials publicly available.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信