关于从交叉样本中学习稀疏线性模型

IF 3.4 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Mina Sadat Mahmoudi , Seyed Abolfazl Motahari , Babak Khalaj
{"title":"关于从交叉样本中学习稀疏线性模型","authors":"Mina Sadat Mahmoudi ,&nbsp;Seyed Abolfazl Motahari ,&nbsp;Babak Khalaj","doi":"10.1016/j.sigpro.2024.109680","DOIUrl":null,"url":null,"abstract":"<div><p>The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independently, and after crossing (making all possible combinations of drugs and cells), the resulting output (efficacy of drugs) is measured. We call these types of samples as “cross samples”. The dependency among such samples is strong, and existing theoretical studies are either inapplicable or fail to provide realistic bounds. We aim at analyzing the performance of the Lasso estimator where the underlying distributions are mixtures of Gaussians and the data dependency arises from the crossing procedure. Our theoretical results show that the performance of the Lasso estimator in case of cross samples follows that of the i.i.d. samples with differences in constant factors. Through numerical results, we observe a phase transition: When datasets are too small, the error for cross samples is much larger than for i.i.d. samples, but once the size is large enough, cross samples are nearly as useful as i.i.d. samples. Our theoretical analysis suggests that the transition threshold is governed by the level of sparsity of the true parameter vector being estimated.</p></div>","PeriodicalId":49523,"journal":{"name":"Signal Processing","volume":"227 ","pages":"Article 109680"},"PeriodicalIF":3.4000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On learning sparse linear models from cross samples\",\"authors\":\"Mina Sadat Mahmoudi ,&nbsp;Seyed Abolfazl Motahari ,&nbsp;Babak Khalaj\",\"doi\":\"10.1016/j.sigpro.2024.109680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independently, and after crossing (making all possible combinations of drugs and cells), the resulting output (efficacy of drugs) is measured. We call these types of samples as “cross samples”. The dependency among such samples is strong, and existing theoretical studies are either inapplicable or fail to provide realistic bounds. We aim at analyzing the performance of the Lasso estimator where the underlying distributions are mixtures of Gaussians and the data dependency arises from the crossing procedure. Our theoretical results show that the performance of the Lasso estimator in case of cross samples follows that of the i.i.d. samples with differences in constant factors. Through numerical results, we observe a phase transition: When datasets are too small, the error for cross samples is much larger than for i.i.d. samples, but once the size is large enough, cross samples are nearly as useful as i.i.d. samples. Our theoretical analysis suggests that the transition threshold is governed by the level of sparsity of the true parameter vector being estimated.</p></div>\",\"PeriodicalId\":49523,\"journal\":{\"name\":\"Signal Processing\",\"volume\":\"227 \",\"pages\":\"Article 109680\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0165168424003001\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165168424003001","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

本文研究了样本具有依赖性的稀疏线性模型的样本复杂性。我们考虑了样本的一种特定依赖结构,这种结构出现在某些实验设计中,如药物敏感性研究,其中两组对象(药物和细胞)被独立采样,在交叉(对药物和细胞进行所有可能的组合)后,对结果输出(药物疗效)进行测量。我们称这类样本为 "交叉样本"。这类样本之间的依赖性很强,现有的理论研究要么不适用,要么无法提供现实的界限。我们的目标是分析 Lasso 估计器的性能,在这种情况下,底层分布是高斯混合物,数据依赖性来自交叉过程。我们的理论结果表明,在交叉样本的情况下,拉索估计器的性能与具有常数因子差异的 i.i.d. 样本的性能相同。通过数值结果,我们观察到一个阶段性转变:当数据集太小时,交叉样本的误差比 i.i.d. 样本的误差大得多,但一旦数据集足够大,交叉样本就几乎和 i.i.d. 样本一样有用。我们的理论分析表明,过渡阈值取决于所估计的真实参数向量的稀疏程度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On learning sparse linear models from cross samples

The sample complexity of a sparse linear model where samples are dependent is studied in this paper. We consider a specific dependency structure of the samples which arises in some experimental designs such as drug sensitivity studies, where two sets of objects (drugs and cells) are sampled independently, and after crossing (making all possible combinations of drugs and cells), the resulting output (efficacy of drugs) is measured. We call these types of samples as “cross samples”. The dependency among such samples is strong, and existing theoretical studies are either inapplicable or fail to provide realistic bounds. We aim at analyzing the performance of the Lasso estimator where the underlying distributions are mixtures of Gaussians and the data dependency arises from the crossing procedure. Our theoretical results show that the performance of the Lasso estimator in case of cross samples follows that of the i.i.d. samples with differences in constant factors. Through numerical results, we observe a phase transition: When datasets are too small, the error for cross samples is much larger than for i.i.d. samples, but once the size is large enough, cross samples are nearly as useful as i.i.d. samples. Our theoretical analysis suggests that the transition threshold is governed by the level of sparsity of the true parameter vector being estimated.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Signal Processing
Signal Processing 工程技术-工程:电子与电气
CiteScore
9.20
自引率
9.10%
发文量
309
审稿时长
41 days
期刊介绍: Signal Processing incorporates all aspects of the theory and practice of signal processing. It features original research work, tutorial and review articles, and accounts of practical developments. It is intended for a rapid dissemination of knowledge and experience to engineers and scientists working in the research, development or practical application of signal processing. Subject areas covered by the journal include: Signal Theory; Stochastic Processes; Detection and Estimation; Spectral Analysis; Filtering; Signal Processing Systems; Software Developments; Image Processing; Pattern Recognition; Optical Signal Processing; Digital Signal Processing; Multi-dimensional Signal Processing; Communication Signal Processing; Biomedical Signal Processing; Geophysical and Astrophysical Signal Processing; Earth Resources Signal Processing; Acoustic and Vibration Signal Processing; Data Processing; Remote Sensing; Signal Processing Technology; Radar Signal Processing; Sonar Signal Processing; Industrial Applications; New Applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信