Raul Matsushita , Gabriel Gomes , Regina Da Fonseca , Eduardo Nakano , Roberto Vila
{"title":"利用rsamnyi散度评估稀疏分类的拟合优度","authors":"Raul Matsushita , Gabriel Gomes , Regina Da Fonseca , Eduardo Nakano , Roberto Vila","doi":"10.1016/j.jspi.2025.106350","DOIUrl":null,"url":null,"abstract":"<div><div>We present the Rényi divergence as a statistic for assessing goodness-of-fit in sparse frequency tables, where small expected counts can undermine the reliability of the traditional chi-square test. The Rényi divergence with index in (0,1) is a natural choice because it circumvents division-related issues by small frequencies. Our main result demonstrates that the Rényi statistic asymptotically follows a chi-square distribution. Through theoretical insights and Monte Carlo simulations, we evaluate the performance of the Rényi statistic across various values of the divergence index. We find that smaller index values improve the alignment of the Rényi statistic with the chi-square distribution and enhance its performance in sparse data settings. Additionally, the Rényi statistic exhibits good power properties in detecting deviations from the null hypothesis under these conditions. To illustrate its practical applicability, we present two real-world data analyses, highlighting the robustness of the Rényi divergence in scenarios involving sparse categories.</div></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"242 ","pages":"Article 106350"},"PeriodicalIF":0.8000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing goodness-of-fit for sparse categories using Rényi divergence\",\"authors\":\"Raul Matsushita , Gabriel Gomes , Regina Da Fonseca , Eduardo Nakano , Roberto Vila\",\"doi\":\"10.1016/j.jspi.2025.106350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>We present the Rényi divergence as a statistic for assessing goodness-of-fit in sparse frequency tables, where small expected counts can undermine the reliability of the traditional chi-square test. The Rényi divergence with index in (0,1) is a natural choice because it circumvents division-related issues by small frequencies. Our main result demonstrates that the Rényi statistic asymptotically follows a chi-square distribution. Through theoretical insights and Monte Carlo simulations, we evaluate the performance of the Rényi statistic across various values of the divergence index. We find that smaller index values improve the alignment of the Rényi statistic with the chi-square distribution and enhance its performance in sparse data settings. Additionally, the Rényi statistic exhibits good power properties in detecting deviations from the null hypothesis under these conditions. To illustrate its practical applicability, we present two real-world data analyses, highlighting the robustness of the Rényi divergence in scenarios involving sparse categories.</div></div>\",\"PeriodicalId\":50039,\"journal\":{\"name\":\"Journal of Statistical Planning and Inference\",\"volume\":\"242 \",\"pages\":\"Article 106350\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Planning and Inference\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378375825000886\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375825000886","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Assessing goodness-of-fit for sparse categories using Rényi divergence
We present the Rényi divergence as a statistic for assessing goodness-of-fit in sparse frequency tables, where small expected counts can undermine the reliability of the traditional chi-square test. The Rényi divergence with index in (0,1) is a natural choice because it circumvents division-related issues by small frequencies. Our main result demonstrates that the Rényi statistic asymptotically follows a chi-square distribution. Through theoretical insights and Monte Carlo simulations, we evaluate the performance of the Rényi statistic across various values of the divergence index. We find that smaller index values improve the alignment of the Rényi statistic with the chi-square distribution and enhance its performance in sparse data settings. Additionally, the Rényi statistic exhibits good power properties in detecting deviations from the null hypothesis under these conditions. To illustrate its practical applicability, we present two real-world data analyses, highlighting the robustness of the Rényi divergence in scenarios involving sparse categories.
期刊介绍:
The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists.
We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.