Interpreting artificial neural networks to detect genome-wide association signals for complex traits.

IF 2.8 Q1 GENETICS & HEREDITY

NAR Genomics and Bioinformatics Pub Date : 2026-02-23 eCollection Date: 2026-03-01 DOI:10.1093/nargab/lqag019

Burak Yelmen, Maris Alver, Merve Nur Güler, Flora Jay, Lili Milani

{"title":"Interpreting artificial neural networks to detect genome-wide association signals for complex traits.","authors":"Burak Yelmen, Maris Alver, Merve Nur Güler, Flora Jay, Lili Milani","doi":"10.1093/nargab/lqag019","DOIUrl":null,"url":null,"abstract":"Investigating the genetic architecture of complex diseases is challenging due to the multifactorial interplay of genomic and environmental influences. Although GWAS have identified thousands of variants for multiple complex traits, conventional statistical approaches can be limited by simplified assumptions such as linearity and lack of epistasis. In this work, we trained artificial neural networks using genome-wide genotype data to predict simulated and real complex traits. We extracted feature importance scores via different post hoc interpretability methods to identify potentially associated locus/loci (PAL) for the target phenotype and devised an approach for estimating P-values for the detected PAL. Simulations demonstrated that associated loci can be detected with good precision using strict selection criteria. By applying our approach to the schizophrenia cohort in the Estonian Biobank, we detected multiple loci not identified by linear methods. There was significant concordance between PAL and loci previously associated with schizophrenia and bipolar disorder, with enrichment analyses of genes within the identified PAL predominantly highlighting terms related to brain morphology and function. With advancements in model optimization and uncertainty quantification, artificial neural networks have the potential to enhance the identification of genomic loci associated with complex diseases, offering a more comprehensive approach for GWAS.","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"8 1","pages":"lqag019"},"PeriodicalIF":2.8000,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12964191/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqag019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Investigating the genetic architecture of complex diseases is challenging due to the multifactorial interplay of genomic and environmental influences. Although GWAS have identified thousands of variants for multiple complex traits, conventional statistical approaches can be limited by simplified assumptions such as linearity and lack of epistasis. In this work, we trained artificial neural networks using genome-wide genotype data to predict simulated and real complex traits. We extracted feature importance scores via different post hoc interpretability methods to identify potentially associated locus/loci (PAL) for the target phenotype and devised an approach for estimating P-values for the detected PAL. Simulations demonstrated that associated loci can be detected with good precision using strict selection criteria. By applying our approach to the schizophrenia cohort in the Estonian Biobank, we detected multiple loci not identified by linear methods. There was significant concordance between PAL and loci previously associated with schizophrenia and bipolar disorder, with enrichment analyses of genes within the identified PAL predominantly highlighting terms related to brain morphology and function. With advancements in model optimization and uncertainty quantification, artificial neural networks have the potential to enhance the identification of genomic loci associated with complex diseases, offering a more comprehensive approach for GWAS.

查看原文本刊更多论文

解释人工神经网络以检测复杂性状的全基因组关联信号。

由于基因组和环境影响的多因素相互作用，研究复杂疾病的遗传结构具有挑战性。尽管GWAS已经确定了多种复杂性状的数千种变异，但传统的统计方法可能受到线性和缺乏上位性等简化假设的限制。在这项工作中，我们使用全基因组基因型数据训练人工神经网络来预测模拟和真实的复杂性状。我们通过不同的事后可解释性方法提取特征重要性分数，以识别目标表型的潜在关联位点/位点（PAL），并设计了一种估计检测到的PAL的p值的方法。模拟表明，使用严格的选择标准可以以较高的精度检测到相关位点。通过将我们的方法应用于爱沙尼亚生物银行的精神分裂症队列，我们检测到多个未被线性方法识别的位点。PAL与先前与精神分裂症和双相情感障碍相关的基因位点之间存在显著的一致性，对鉴定的PAL内基因的富集分析主要突出了与脑形态和功能相关的术语。随着模型优化和不确定性量化的进步，人工神经网络有可能增强与复杂疾病相关的基因组位点的识别，为GWAS提供更全面的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊