Burak Yelmen, Maris Alver, Merve Nur Güler, Flora Jay, Lili Milani
{"title":"Interpreting artificial neural networks to detect genome-wide association signals for complex traits.","authors":"Burak Yelmen, Maris Alver, Merve Nur Güler, Flora Jay, Lili Milani","doi":"10.1093/nargab/lqag019","DOIUrl":null,"url":null,"abstract":"<p><p>Investigating the genetic architecture of complex diseases is challenging due to the multifactorial interplay of genomic and environmental influences. Although GWAS have identified thousands of variants for multiple complex traits, conventional statistical approaches can be limited by simplified assumptions such as linearity and lack of epistasis. In this work, we trained artificial neural networks using genome-wide genotype data to predict simulated and real complex traits. We extracted feature importance scores via different post hoc interpretability methods to identify potentially associated locus/loci (PAL) for the target phenotype and devised an approach for estimating <i>P</i>-values for the detected PAL. Simulations demonstrated that associated loci can be detected with good precision using strict selection criteria. By applying our approach to the schizophrenia cohort in the Estonian Biobank, we detected multiple loci not identified by linear methods. There was significant concordance between PAL and loci previously associated with schizophrenia and bipolar disorder, with enrichment analyses of genes within the identified PAL predominantly highlighting terms related to brain morphology and function. With advancements in model optimization and uncertainty quantification, artificial neural networks have the potential to enhance the identification of genomic loci associated with complex diseases, offering a more comprehensive approach for GWAS.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"8 1","pages":"lqag019"},"PeriodicalIF":2.8000,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12964191/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqag019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Investigating the genetic architecture of complex diseases is challenging due to the multifactorial interplay of genomic and environmental influences. Although GWAS have identified thousands of variants for multiple complex traits, conventional statistical approaches can be limited by simplified assumptions such as linearity and lack of epistasis. In this work, we trained artificial neural networks using genome-wide genotype data to predict simulated and real complex traits. We extracted feature importance scores via different post hoc interpretability methods to identify potentially associated locus/loci (PAL) for the target phenotype and devised an approach for estimating P-values for the detected PAL. Simulations demonstrated that associated loci can be detected with good precision using strict selection criteria. By applying our approach to the schizophrenia cohort in the Estonian Biobank, we detected multiple loci not identified by linear methods. There was significant concordance between PAL and loci previously associated with schizophrenia and bipolar disorder, with enrichment analyses of genes within the identified PAL predominantly highlighting terms related to brain morphology and function. With advancements in model optimization and uncertainty quantification, artificial neural networks have the potential to enhance the identification of genomic loci associated with complex diseases, offering a more comprehensive approach for GWAS.