Ansgar Steland, Ewaryst Rafajłowicz, Wojciech Rafajłowicz
{"title":"离散环境中的一般自适应阈值监测和不平衡类规则","authors":"Ansgar Steland, Ewaryst Rafajłowicz, Wojciech Rafajłowicz","doi":"10.1111/stan.12352","DOIUrl":null,"url":null,"abstract":"Having in mind applications in statistics and machine learning such as individualized care monitoring, or watermark detection in large language models, we consider the following general setting: When monitoring a sequence of observations, , there may be additional information, , on the environment which should be used to design the monitoring procedure. This additional information can be incorporated by applying threshold functions to the standardized measurements to adapt the detector to the environment. For the case of categorical data encoding of discrete‐valued environmental information we study several classes of level threshold functions including a proportional one which favors rare events among imbalanced classes. For the latter rule asymptotic theory is developed for independent and identically distributed and dependent learning samples including data from new discrete autoregressive moving average model (NDARMA) series and Hidden Markov Models. Further, we propose two‐stage designs which allow to distribute in a controlled way the budget over an a priori partition of the sample space of . The approach is illustrated by a real medical data set.","PeriodicalId":51178,"journal":{"name":"Statistica Neerlandica","volume":"7 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"General adapted‐threshold monitoring in discrete environments and rules for imbalanced classes\",\"authors\":\"Ansgar Steland, Ewaryst Rafajłowicz, Wojciech Rafajłowicz\",\"doi\":\"10.1111/stan.12352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Having in mind applications in statistics and machine learning such as individualized care monitoring, or watermark detection in large language models, we consider the following general setting: When monitoring a sequence of observations, , there may be additional information, , on the environment which should be used to design the monitoring procedure. This additional information can be incorporated by applying threshold functions to the standardized measurements to adapt the detector to the environment. For the case of categorical data encoding of discrete‐valued environmental information we study several classes of level threshold functions including a proportional one which favors rare events among imbalanced classes. For the latter rule asymptotic theory is developed for independent and identically distributed and dependent learning samples including data from new discrete autoregressive moving average model (NDARMA) series and Hidden Markov Models. Further, we propose two‐stage designs which allow to distribute in a controlled way the budget over an a priori partition of the sample space of . The approach is illustrated by a real medical data set.\",\"PeriodicalId\":51178,\"journal\":{\"name\":\"Statistica Neerlandica\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistica Neerlandica\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1111/stan.12352\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Neerlandica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1111/stan.12352","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
General adapted‐threshold monitoring in discrete environments and rules for imbalanced classes
Having in mind applications in statistics and machine learning such as individualized care monitoring, or watermark detection in large language models, we consider the following general setting: When monitoring a sequence of observations, , there may be additional information, , on the environment which should be used to design the monitoring procedure. This additional information can be incorporated by applying threshold functions to the standardized measurements to adapt the detector to the environment. For the case of categorical data encoding of discrete‐valued environmental information we study several classes of level threshold functions including a proportional one which favors rare events among imbalanced classes. For the latter rule asymptotic theory is developed for independent and identically distributed and dependent learning samples including data from new discrete autoregressive moving average model (NDARMA) series and Hidden Markov Models. Further, we propose two‐stage designs which allow to distribute in a controlled way the budget over an a priori partition of the sample space of . The approach is illustrated by a real medical data set.
期刊介绍:
Statistica Neerlandica has been the journal of the Netherlands Society for Statistics and Operations Research since 1946. It covers all areas of statistics, from theoretical to applied, with a special emphasis on mathematical statistics, statistics for the behavioural sciences and biostatistics. This wide scope is reflected by the expertise of the journal’s editors representing these areas. The diverse editorial board is committed to a fast and fair reviewing process, and will judge submissions on quality, correctness, relevance and originality. Statistica Neerlandica encourages transparency and reproducibility, and offers online resources to make data, code, simulation results and other additional materials publicly available.