{"title":"揭示基因组异质性和共性:一种考虑到测量邻接结构的惩罚性综合分析方法。","authors":"Xindi Wang, Yu Jiang, Yifan Sun","doi":"10.1002/gepi.22549","DOIUrl":null,"url":null,"abstract":"<p>Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.</p>","PeriodicalId":12710,"journal":{"name":"Genetic Epidemiology","volume":"48 3","pages":"114-140"},"PeriodicalIF":1.7000,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Revealing genomic heterogeneity and commonality: A penalized integrative analysis approach accounting for the adjacency structure of measurements\",\"authors\":\"Xindi Wang, Yu Jiang, Yifan Sun\",\"doi\":\"10.1002/gepi.22549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.</p>\",\"PeriodicalId\":12710,\"journal\":{\"name\":\"Genetic Epidemiology\",\"volume\":\"48 3\",\"pages\":\"114-140\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-02-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetic Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22549\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetic Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gepi.22549","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
摘要
高通量基因组技术的进步提供了大规模的基因组数据,从而彻底改变了疾病生物标志物鉴定领域。人们越来越重视了解具有不同疾病亚型和特征的不同患者群体之间的关系。复杂的疾病既有异质性,也有共同的基因组因素,因此必须研究这些模式,以准确检测标记物,全面了解疾病。整合分析已成为应对这一挑战的一种有前途的方法。然而,现有的研究由于忽略了单核苷酸多态性(SNP)和 DNA 甲基化等基因组测量的邻接结构而受到限制。在本研究中,我们提出了一种结构化综合分析方法,该方法结合了样条线型惩罚,以适应这种邻接结构。我们利用融合套索型惩罚来识别各组间的异质性和共性。大量的模拟证明,与几种直接竞争的方法相比,这种方法更胜一筹。对癌症基因组图谱黑色素瘤数据(DNA 甲基化测量)和 GENEVA 糖尿病数据(SNP 测量)的分析表明,所提出的分析方法具有更好的预测性能和更高的选择稳定性,能带来有意义的发现。
Revealing genomic heterogeneity and commonality: A penalized integrative analysis approach accounting for the adjacency structure of measurements
Advancements in high-throughput genomic technologies have revolutionized the field of disease biomarker identification by providing large-scale genomic data. There is an increasing focus on understanding the relationships among diverse patient groups with distinct disease subtypes and characteristics. Complex diseases exhibit both heterogeneity and shared genomic factors, making it essential to investigate these patterns to accurately detect markers and comprehensively understand the diseases. Integrative analysis has emerged as a promising approach to address this challenge. However, existing studies have been limited by ignoring the adjacency structure of genomic measurements, such as single nucleotide polymorphisms (SNPs) and DNA methylations. In this study, we propose a structured integrative analysis method that incorporates a spline type penalty to accommodate this adjacency structure. We utilize a fused lasso type penalty to identify both heterogeneity and commonality across the groups. Extensive simulations demonstrate its superiority compared to several direct competing methods. The analysis of The Cancer Genome Atlas melanoma data with DNA methylation measurements and GENEVA diabetes data with SNP measurements exhibit that the proposed analysis lead to meaningful findings with better prediction performance and higher selection stability.
期刊介绍:
Genetic Epidemiology is a peer-reviewed journal for discussion of research on the genetic causes of the distribution of human traits in families and populations. Emphasis is placed on the relative contribution of genetic and environmental factors to human disease as revealed by genetic, epidemiological, and biologic investigations.
Genetic Epidemiology primarily publishes papers in statistical genetics, a research field that is primarily concerned with development of statistical, bioinformatical, and computational models for analyzing genetic data. Incorporation of underlying biology and population genetics into conceptual models is favored. The Journal seeks original articles comprising either applied research or innovative statistical, mathematical, computational, or genomic methodologies that advance studies in genetic epidemiology. Other types of reports are encouraged, such as letters to the editor, topic reviews, and perspectives from other fields of research that will likely enrich the field of genetic epidemiology.