稀有事件数据和稀疏模型的尺度不变最优抽样。

Advances in neural information processing systems Pub Date : 2024-01-01

Jing Wang, HaiYing Wang, Hao Helen Zhang

{"title":"稀有事件数据和稀疏模型的尺度不变最优抽样。","authors":"Jing Wang, HaiYing Wang, Hao Helen Zhang","doi":"","DOIUrl":null,"url":null,"abstract":"Subsampling is effective in tackling computational challenges for massive data with rare events. Overly aggressive subsampling may adversely affect estimation efficiency, and optimal subsampling is essential to mitigate the information loss. However, existing optimal subsampling probabilities depends on data scales, and some scaling transformations may result in inefficient subsamples. This problem is more significant when there are inactive features, because their influence on the subsampling probabilities can be arbitrarily magnified by inappropriate scaling transformations. We tackle this challenge and introduce a scale-invariant optimal subsampling function in the context of sparse models, where inactive features are commonly assumed. Instead of focusing on estimating model parameters, we define an optimal subsampling function to minimize the prediction error, using adaptive lasso as an example to outline the estimation procedure and study its theoretical guarantee. We first introduce the adaptive lasso estimator for rare-events data and establish its oracle properties, thereby validating the use of subsampling. Then we derive a scale-invariant optimal subsampling function that minimizes the prediction error of the inverse probability weighted (IPW) adaptive lasso. Finally, we present an estimator based on the maximum sampled conditional likelihood (MSCL) to further improve the estimation efficiency. We conduct numerical experiments using both simulated and real-world data sets to demonstrate the performance of the proposed methods.","PeriodicalId":72099,"journal":{"name":"Advances in neural information processing systems","volume":"37 ","pages":"98384-98418"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245189/pdf/","citationCount":"0","resultStr":"{\"title\":\"Scale-invariant Optimal Sampling for Rare-events Data and Sparse Models.\",\"authors\":\"Jing Wang, HaiYing Wang, Hao Helen Zhang\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Subsampling is effective in tackling computational challenges for massive data with rare events. Overly aggressive subsampling may adversely affect estimation efficiency, and optimal subsampling is essential to mitigate the information loss. However, existing optimal subsampling probabilities depends on data scales, and some scaling transformations may result in inefficient subsamples. This problem is more significant when there are inactive features, because their influence on the subsampling probabilities can be arbitrarily magnified by inappropriate scaling transformations. We tackle this challenge and introduce a scale-invariant optimal subsampling function in the context of sparse models, where inactive features are commonly assumed. Instead of focusing on estimating model parameters, we define an optimal subsampling function to minimize the prediction error, using adaptive lasso as an example to outline the estimation procedure and study its theoretical guarantee. We first introduce the adaptive lasso estimator for rare-events data and establish its oracle properties, thereby validating the use of subsampling. Then we derive a scale-invariant optimal subsampling function that minimizes the prediction error of the inverse probability weighted (IPW) adaptive lasso. Finally, we present an estimator based on the maximum sampled conditional likelihood (MSCL) to further improve the estimation efficiency. We conduct numerical experiments using both simulated and real-world data sets to demonstrate the performance of the proposed methods.\",\"PeriodicalId\":72099,\"journal\":{\"name\":\"Advances in neural information processing systems\",\"volume\":\"37 \",\"pages\":\"98384-98418\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245189/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in neural information processing systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in neural information processing systems","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

子采样是解决具有罕见事件的海量数据计算难题的有效方法。过于激进的子采样可能会对估计效率产生不利影响，而最优的子采样对于减轻信息损失至关重要。然而，现有的最优子抽样概率依赖于数据尺度，一些尺度变换可能导致低效的子抽样。当存在非活动特征时，这个问题更为严重，因为它们对子采样概率的影响可以通过不适当的缩放变换任意放大。我们解决了这一挑战，并在稀疏模型的背景下引入了一个尺度不变的最优子采样函数，其中通常假设非活动特征。我们不再关注模型参数的估计，而是定义了一个最优子抽样函数来最小化预测误差，并以自适应套索为例概述了估计过程并研究了其理论保证。我们首先介绍了稀有事件数据的自适应lasso估计器，并建立了它的oracle属性，从而验证了子抽样的使用。然后推导出一个尺度不变的最优子抽样函数，使逆概率加权自适应套索的预测误差最小化。最后，提出了一种基于最大采样条件似然（MSCL）的估计方法，进一步提高了估计效率。我们使用模拟和真实世界的数据集进行数值实验，以证明所提出方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

本刊更多论文

Scale-invariant Optimal Sampling for Rare-events Data and Sparse Models.

Subsampling is effective in tackling computational challenges for massive data with rare events. Overly aggressive subsampling may adversely affect estimation efficiency, and optimal subsampling is essential to mitigate the information loss. However, existing optimal subsampling probabilities depends on data scales, and some scaling transformations may result in inefficient subsamples. This problem is more significant when there are inactive features, because their influence on the subsampling probabilities can be arbitrarily magnified by inappropriate scaling transformations. We tackle this challenge and introduce a scale-invariant optimal subsampling function in the context of sparse models, where inactive features are commonly assumed. Instead of focusing on estimating model parameters, we define an optimal subsampling function to minimize the prediction error, using adaptive lasso as an example to outline the estimation procedure and study its theoretical guarantee. We first introduce the adaptive lasso estimator for rare-events data and establish its oracle properties, thereby validating the use of subsampling. Then we derive a scale-invariant optimal subsampling function that minimizes the prediction error of the inverse probability weighted (IPW) adaptive lasso. Finally, we present an estimator based on the maximum sampled conditional likelihood (MSCL) to further improve the estimation efficiency. We conduct numerical experiments using both simulated and real-world data sets to demonstrate the performance of the proposed methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances in neural information processing systems

自引率

0.00%

发文量