不精确区间数据的多样本均值比较

IF 3.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Approximate Reasoning Pub Date : 2024-11-08 DOI:10.1016/j.ijar.2024.109322

Yan Sun , Zac Rios , Brennan Bean

{"title":"不精确区间数据的多样本均值比较","authors":"Yan Sun , Zac Rios , Brennan Bean","doi":"10.1016/j.ijar.2024.109322","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, interval data have become an increasingly popular tool to solving modern data problems. Intervals are now often used for dimensionality reduction, data aggregation, privacy censorship, and quantifying awareness of various uncertainties. Among many statistical methods that are being studied and developed for interval data, significance tests are of particular importance due to their fundamental value both in theory and practice. The difficulty in developing such tests mainly lies in the fact that the concept of normality does not extend naturally to intervals, making the exact tests hard to formulate. As a result, most existing works have relied on bootstrap methods to approximate null distributions. However, this is not always feasible given limited sample sizes or other intrinsic characteristics of the data. In this paper, we propose a novel asymptotic test for comparing multi-sample means with interval data as a generalization of the classic ANOVA. Based on the random sets theory, we construct the test statistic in the form of a ratio of between-group interval variance and within-group interval variance. The limiting null distribution is derived under usual assumptions and mild regularity conditions. Simulation studies with various data configurations validate the asymptotic result, and show promising small sample performances. Finally, a real interval data ANOVA analysis is presented that showcases the applicability of our method.</div></div>","PeriodicalId":13842,"journal":{"name":"International Journal of Approximate Reasoning","volume":"176 ","pages":"Article 109322"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-sample means comparisons for imprecise interval data\",\"authors\":\"Yan Sun , Zac Rios , Brennan Bean\",\"doi\":\"10.1016/j.ijar.2024.109322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, interval data have become an increasingly popular tool to solving modern data problems. Intervals are now often used for dimensionality reduction, data aggregation, privacy censorship, and quantifying awareness of various uncertainties. Among many statistical methods that are being studied and developed for interval data, significance tests are of particular importance due to their fundamental value both in theory and practice. The difficulty in developing such tests mainly lies in the fact that the concept of normality does not extend naturally to intervals, making the exact tests hard to formulate. As a result, most existing works have relied on bootstrap methods to approximate null distributions. However, this is not always feasible given limited sample sizes or other intrinsic characteristics of the data. In this paper, we propose a novel asymptotic test for comparing multi-sample means with interval data as a generalization of the classic ANOVA. Based on the random sets theory, we construct the test statistic in the form of a ratio of between-group interval variance and within-group interval variance. The limiting null distribution is derived under usual assumptions and mild regularity conditions. Simulation studies with various data configurations validate the asymptotic result, and show promising small sample performances. Finally, a real interval data ANOVA analysis is presented that showcases the applicability of our method.</div></div>\",\"PeriodicalId\":13842,\"journal\":{\"name\":\"International Journal of Approximate Reasoning\",\"volume\":\"176 \",\"pages\":\"Article 109322\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Approximate Reasoning\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0888613X24002093\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Approximate Reasoning","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0888613X24002093","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，区间数据日益成为解决现代数据问题的常用工具。区间数据现在经常被用于降维、数据聚合、隐私审查以及量化对各种不确定性的认识。在针对区间数据研究和开发的众多统计方法中，显著性检验因其在理论和实践中的基本价值而尤为重要。开发这类检验的难点主要在于，正态性的概念并不能自然地延伸到区间，因此很难制定精确的检验方法。因此，大多数现有研究都依赖于引导法来近似空分布。然而，考虑到有限的样本量或数据的其他固有特征，这种方法并不总是可行的。在本文中，我们提出了一种新的渐近检验方法，用于比较区间数据的多样本均值，作为经典方差分析的一般化。基于随机集理论，我们以组间区间方差和组内区间方差之比形式构建检验统计量。在通常的假设和温和的正则条件下，推导出了极限零分布。利用各种数据配置进行的模拟研究验证了渐近结果，并显示出良好的小样本性能。最后，介绍了一个真实的区间数据方差分析，展示了我们方法的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-sample means comparisons for imprecise interval data

In recent years, interval data have become an increasingly popular tool to solving modern data problems. Intervals are now often used for dimensionality reduction, data aggregation, privacy censorship, and quantifying awareness of various uncertainties. Among many statistical methods that are being studied and developed for interval data, significance tests are of particular importance due to their fundamental value both in theory and practice. The difficulty in developing such tests mainly lies in the fact that the concept of normality does not extend naturally to intervals, making the exact tests hard to formulate. As a result, most existing works have relied on bootstrap methods to approximate null distributions. However, this is not always feasible given limited sample sizes or other intrinsic characteristics of the data. In this paper, we propose a novel asymptotic test for comparing multi-sample means with interval data as a generalization of the classic ANOVA. Based on the random sets theory, we construct the test statistic in the form of a ratio of between-group interval variance and within-group interval variance. The limiting null distribution is derived under usual assumptions and mild regularity conditions. Simulation studies with various data configurations validate the asymptotic result, and show promising small sample performances. Finally, a real interval data ANOVA analysis is presented that showcases the applicability of our method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Approximate Reasoning 工程技术-计算机：人工智能

CiteScore

6.90

自引率

12.80%

发文量

170

审稿时长

67 days

期刊介绍： The International Journal of Approximate Reasoning is intended to serve as a forum for the treatment of imprecision and uncertainty in Artificial and Computational Intelligence, covering both the foundations of uncertainty theories, and the design of intelligent systems for scientific and engineering applications. It publishes high-quality research papers describing theoretical developments or innovative applications, as well as review articles on topics of general interest. Relevant topics include, but are not limited to, probabilistic reasoning and Bayesian networks, imprecise probabilities, random sets, belief functions (Dempster-Shafer theory), possibility theory, fuzzy sets, rough sets, decision theory, non-additive measures and integrals, qualitative reasoning about uncertainty, comparative probability orderings, game-theoretic probability, default reasoning, nonstandard logics, argumentation systems, inconsistency tolerant reasoning, elicitation techniques, philosophical foundations and psychological models of uncertain reasoning. Domains of application for uncertain reasoning systems include risk analysis and assessment, information retrieval and database design, information fusion, machine learning, data and web mining, computer vision, image and signal processing, intelligent data analysis, statistics, multi-agent systems, etc.