树值：回归树的选择性推理

IF 5.2 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research Pub Date : 2022-01-01

Anna C Neufeld, Lucy L Gao, Daniela M Witten

{"title":"树值：回归树的选择性推理","authors":"Anna C Neufeld, Lucy L Gao, Daniela M Witten","doi":"","DOIUrl":null,"url":null,"abstract":"We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 ","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10933572/pdf/","citationCount":"0","resultStr":"{\"title\":\"Tree-Values: Selective Inference for Regression Trees.\",\"authors\":\"Anna C Neufeld, Lucy L Gao, Daniela M Witten\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.\",\"PeriodicalId\":50161,\"journal\":{\"name\":\"Journal of Machine Learning Research\",\"volume\":\"23 \",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10933572/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Machine Learning Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine Learning Research","FirstCategoryId":"94","ListUrlMain":"","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

我们考虑对分类与回归树（CART）（Breiman 等人，1984 年）算法的输出结果进行推断。如果不考虑该树是根据数据估计出来的这一事实，天真的推理方法将无法实现标准保证，如第一类错误率控制和名义覆盖率。因此，我们提出了一种选择性推断框架，用于对拟合 CART 树进行推断。简而言之，我们的条件是该树是根据数据估计出来的。我们提出了一对终端节点之间平均响应差异的检验方法，以控制选择性 1 类错误率，并提出了单个终端节点内平均响应的置信区间，以实现名义选择性覆盖。我们提供了计算必要条件集的高效算法。我们将这些方法应用于模拟和一个数据集，该数据集涉及份量控制干预与热量摄入之间的关联。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

本刊更多论文

Tree-Values: Selective Inference for Regression Trees.

We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Machine Learning Research 工程技术-计算机：人工智能

CiteScore

18.80

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. JMLR seeks previously unpublished papers on machine learning that contain: new principled algorithms with sound empirical validation, and with justification of theoretical, psychological, or biological nature; experimental and/or theoretical studies yielding new insight into the design and behavior of learning in intelligent systems; accounts of applications of existing techniques that shed light on the strengths and weaknesses of the methods; formalization of new learning tasks (e.g., in the context of new applications) and of methods for assessing performance on those tasks; development of new analytical frameworks that advance theoretical studies of practical learning methods; computational models of data from natural learning systems at the behavioral or neural level; or extremely well-written surveys of existing work.