Cluster-localized sparse logistic regression for SNP data.

IF 0.4 4区数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Statistical Applications in Genetics and Molecular Biology Pub Date : 2012-08-14 DOI:10.1515/1544-6115.1694

Harald Binder, Tina Müller, Holger Schwender, Klaus Golka, Michael Steffens, Jan G Hengstler, Katja Ickstadt, Martin Schumacher

{"title":"Cluster-localized sparse logistic regression for SNP data.","authors":"Harald Binder, Tina Müller, Holger Schwender, Klaus Golka, Michael Steffens, Jan G Hengstler, Katja Ickstadt, Martin Schumacher","doi":"10.1515/1544-6115.1694","DOIUrl":null,"url":null,"abstract":"<p><p>The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, these groups of individuals are identified using a clustering approach, where each group may be defined via different SNPs. This allows for representing complex interaction patterns, such as compositional epistasis, that might not be detected by a single main effects model. In a simulation study, the CLR approach results in improved prediction performance, compared to the main effects approach, and identification of important SNPs in several scenarios. Improved prediction performance is also obtained for an application example considering urinary bladder cancer. Some of the identified SNPs are predictive for all individuals, while others are only relevant for a specific group. Together with the sets of SNPs that define the groups, potential interaction patterns are uncovered.</p>","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"11 4","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2012-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/1544-6115.1694","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/1544-6115.1694","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 16

Abstract

The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, these groups of individuals are identified using a clustering approach, where each group may be defined via different SNPs. This allows for representing complex interaction patterns, such as compositional epistasis, that might not be detected by a single main effects model. In a simulation study, the CLR approach results in improved prediction performance, compared to the main effects approach, and identification of important SNPs in several scenarios. Improved prediction performance is also obtained for an application example considering urinary bladder cancer. Some of the identified SNPs are predictive for all individuals, while others are only relevant for a specific group. Together with the sets of SNPs that define the groups, potential interaction patterns are uncovered.

查看原文本刊更多论文

SNP数据的聚类局部稀疏逻辑回归。

使用多变量技术在病例对照设计中分析高维单核苷酸多态性(SNP)数据的任务直到最近才得到解决。虽然许多可用的方法只研究高维环境中的主要影响，但我们提出了一种更灵活的技术，即基于局部逻辑回归模型的集群局部回归(CLR)，该技术允许不同的snp对不同的个体群体产生影响。独立的多变量回归模型通过将权重纳入到组件增强中来拟合不同的个体组，这提供了同时的变量选择，因此稀疏拟合。对于模型拟合，使用聚类方法确定这些个体群体，其中每个群体可以通过不同的snp定义。这允许表示复杂的交互模式，例如组合上位，这可能无法被单个主效果模型检测到。在一项模拟研究中，与主效应方法相比，CLR方法的预测性能有所提高，并在几种情况下识别出重要的snp。对于考虑膀胱癌的应用实例，也获得了较好的预测性能。一些已确定的snp对所有个体都具有预测性，而另一些则仅与特定群体相关。与定义组的snp集一起，揭示了潜在的相互作用模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY

自引率

11.10%

发文量

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.