Statistical Analysis and Data Mining最新文献

筛选
英文 中文
Integrative Learning of Structured High-Dimensional Data from Multiple Datasets. 从多个数据集整合学习结构化高维数据
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2023-04-01 Epub Date: 2022-11-08 DOI: 10.1002/sam.11601
Changgee Chang, Zongyu Dai, Jihwan Oh, Qi Long
{"title":"Integrative Learning of Structured High-Dimensional Data from Multiple Datasets.","authors":"Changgee Chang, Zongyu Dai, Jihwan Oh, Qi Long","doi":"10.1002/sam.11601","DOIUrl":"10.1002/sam.11601","url":null,"abstract":"<p><p>Integrative learning of multiple datasets has the potential to mitigate the challenge of small <i>n</i> and large <i>p</i> that is often encountered in analysis of big biomedical data such as genomics data. Detection of weak yet important signals can be enhanced by jointly selecting features for all datasets. However, the set of important features may not always be the same across all datasets. Although some existing integrative learning methods allow heterogeneous sparsity structure where a subset of datasets can have zero coefficients for some selected features, they tend to yield reduced efficiency, reinstating the problem of losing weak important signals. We propose a new integrative learning approach which can not only aggregate important signals well in homogeneous sparsity structure, but also substantially alleviate the problem of losing weak important signals in heterogeneous sparsity structure. Our approach exploits a priori known graphical structure of features and encourages joint selection of features that are connected in the graph. Integrating such prior information over multiple datasets enhances the power, while also accounting for the heterogeneity across datasets. Theoretical properties of the proposed method are investigated. We also demonstrate the limitations of existing approaches and the superiority of our method using a simulation study and analysis of gene expression data from ADNI.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"16 2","pages":"120-134"},"PeriodicalIF":1.3,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10195070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9511811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale affinities with missing data: Estimation and applications. 有缺失数据的多尺度亲和力:估计与应用
IF 2.1 4区 数学
Statistical Analysis and Data Mining Pub Date : 2022-06-01 Epub Date: 2021-11-05 DOI: 10.1002/sam.11561
Min Zhang, Gal Mishne, Eric C Chi
{"title":"Multi-scale affinities with missing data: Estimation and applications.","authors":"Min Zhang, Gal Mishne, Eric C Chi","doi":"10.1002/sam.11561","DOIUrl":"10.1002/sam.11561","url":null,"abstract":"<p><p>Many machine learning algorithms depend on weights that quantify row and column similarities of a data matrix. The choice of weights can dramatically impact the effectiveness of the algorithm. Nonetheless, the problem of choosing weights has arguably not been given enough study. When a data matrix is completely observed, Gaussian kernel affinities can be used to quantify the local similarity between pairs of rows and pairs of columns. Computing weights in the presence of missing data, however, becomes challenging. In this paper, we propose a new method to construct row and column affinities even when data are missing by building off a co-clustering technique. This method takes advantage of solving the optimization problem for multiple pairs of cost parameters and filling in the missing values with increasingly smooth estimates. It exploits the coupled similarity structure among both the rows and columns of a data matrix. We show these affinities can be used to perform tasks such as data imputation, clustering, and matrix completion on graphs.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"15 3","pages":"303-313"},"PeriodicalIF":2.1,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9216212/pdf/nihms-1751214.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9560853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample Selection Bias in Evaluation of Prediction Performance of Causal Models. 因果模型预测性能评价中的样本选择偏差。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2022-02-01 DOI: 10.1002/sam.11559
James P Long, Min Jin Ha
{"title":"Sample Selection Bias in Evaluation of Prediction Performance of Causal Models.","authors":"James P Long,&nbsp;Min Jin Ha","doi":"10.1002/sam.11559","DOIUrl":"https://doi.org/10.1002/sam.11559","url":null,"abstract":"<p><p>Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However prediction performance does depend on the selection of training and test sets. In particular biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren [5]. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance compared to standard association based estimators such as Lasso. Finally we compare the performance of causal estimators in simulation studies which reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"15 1","pages":"5-14"},"PeriodicalIF":1.3,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053600/pdf/nihms-1746637.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10589307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments. 适度计算环境下海量空间数据集的实用贝叶斯建模与推理。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2019-06-01 DOI: 10.1002/sam.11413
Lu Zhang, Abhirup Datta, Sudipto Banerjee
{"title":"Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments.","authors":"Lu Zhang,&nbsp;Abhirup Datta,&nbsp;Sudipto Banerjee","doi":"10.1002/sam.11413","DOIUrl":"https://doi.org/10.1002/sam.11413","url":null,"abstract":"<p><p>With continued advances in Geographic Information Systems and related computational technologies, statisticians are often required to analyze very large spatial datasets. This has generated substantial interest over the last decade, already too vast to be summarized here, in scalable methodologies for analyzing large spatial datasets. Scalable spatial process models have been found especially attractive due to their richness and flexibility and, particularly so in the Bayesian paradigm, due to their presence in hierarchical model settings. However, the vast majority of research articles present in this domain have been geared toward innovative theory or more complex model development. Very limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article devises massively scalable Bayesian approaches that can rapidly deliver inference on spatial process that are practically indistinguishable from inference obtained using more expensive alternatives. A key emphasis is on implementation within very standard (modest) computing environments (e.g., a standard desktop or laptop) using easily available statistical software packages. Key insights are offered regarding assumptions and approximations concerning practical efficiency.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"12 3","pages":"197-209"},"PeriodicalIF":1.3,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信