Are gene-by-environment interactions leveraged in multi-modality neural networks for breast cancer prediction?

arXiv - QuanBio - Genomics Pub Date : 2024-07-30 DOI:arxiv-2407.20978

Monica Isgut, Andrew Hornback, Yunan Luo, Asma Khimani, Neha Jain, May D. Wang

{"title":"Are gene-by-environment interactions leveraged in multi-modality neural networks for breast cancer prediction?","authors":"Monica Isgut, Andrew Hornback, Yunan Luo, Asma Khimani, Neha Jain, May D. Wang","doi":"arxiv-2407.20978","DOIUrl":null,"url":null,"abstract":"Polygenic risk scores (PRSs) can significantly enhance breast cancer risk\nprediction when combined with clinical risk factor data. While many studies\nhave explored the value-add of PRSs, little is known about the potential impact\nof gene-by-gene or gene-by-environment interactions towards enhancing the risk\ndiscrimination capabilities of multi-modal models combining PRSs with clinical\ndata. In this study, we integrated data on 318 individual genotype variants\nalong with clinical data in a neural network to explore whether gene-by-gene\n(i.e., between individual variants) and/or gene-by-environment (between\nclinical risk factors and variants) interactions could be leveraged jointly\nduring training to improve breast cancer risk prediction performance. We\nbenchmarked our approach against a baseline model combining traditional\nunivariate PRSs with clinical data in a logistic regression model and ran an\ninterpretability analysis to identify feature interactions. While our model did not demonstrate improved performance over the baseline,\nwe discovered 248 (<1%) statistically significant gene-by-gene and\ngene-by-environment interactions out of the ~53.6k possible feature pairs, the\nmost contributory of which included rs6001930 (MKL1) and rs889312 (MAP3K1),\nwith age and menopause being the most heavily interacting non-genetic risk\nfactors. We also modeled the significant interactions as a network of highly\nconnected features, suggesting that potential higher-order interactions are\ncaptured by the model. Although gene-by-environment (or gene-by-gene)\ninteractions did not enhance breast cancer risk prediction performance in\nneural networks, our study provides evidence that these interactions can be\nleveraged by these models to inform their predictions. This study represents\nthe first application of neural networks to screen for interactions impacting\nbreast cancer risk using real-world data.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.20978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Polygenic risk scores (PRSs) can significantly enhance breast cancer risk prediction when combined with clinical risk factor data. While many studies have explored the value-add of PRSs, little is known about the potential impact of gene-by-gene or gene-by-environment interactions towards enhancing the risk discrimination capabilities of multi-modal models combining PRSs with clinical data. In this study, we integrated data on 318 individual genotype variants along with clinical data in a neural network to explore whether gene-by-gene (i.e., between individual variants) and/or gene-by-environment (between clinical risk factors and variants) interactions could be leveraged jointly during training to improve breast cancer risk prediction performance. We benchmarked our approach against a baseline model combining traditional univariate PRSs with clinical data in a logistic regression model and ran an interpretability analysis to identify feature interactions. While our model did not demonstrate improved performance over the baseline, we discovered 248 (<1%) statistically significant gene-by-gene and gene-by-environment interactions out of the ~53.6k possible feature pairs, the most contributory of which included rs6001930 (MKL1) and rs889312 (MAP3K1), with age and menopause being the most heavily interacting non-genetic risk factors. We also modeled the significant interactions as a network of highly connected features, suggesting that potential higher-order interactions are captured by the model. Although gene-by-environment (or gene-by-gene) interactions did not enhance breast cancer risk prediction performance in neural networks, our study provides evidence that these interactions can be leveraged by these models to inform their predictions. This study represents the first application of neural networks to screen for interactions impacting breast cancer risk using real-world data.

查看原文本刊更多论文

多模态神经网络在预测乳腺癌时是否利用了基因与环境的相互作用？

多基因风险评分（PRS）与临床风险因素数据相结合，可显著提高乳腺癌风险预测能力。虽然许多研究都探讨了多基因风险评分的增值作用，但对于基因与基因或基因与环境之间的相互作用对提高结合多基因风险评分和临床数据的多模式模型的风险判别能力的潜在影响却知之甚少。在本研究中，我们在神经网络中整合了 318 个个体基因型变异的数据和临床数据，以探索是否可以在训练过程中联合利用基因与基因（即个体变异之间）和/或基因与环境（临床风险因素与变异之间）的相互作用来提高乳腺癌风险预测性能。我们在逻辑回归模型中结合了传统的单变量PRS和临床数据，并进行了可解释性分析，以确定特征相互作用。与基线模型相比，我们的模型并没有表现出更好的性能，但我们在约 53.6 千个可能的特征对中发现了 248 个（<1%）具有统计学意义的基因间和基因与环境间的相互作用，其中贡献最大的包括 rs6001930 (MKL1) 和 rs889312 (MAP3K1)，年龄和更年期是相互作用最严重的非遗传风险因素。我们还将重要的交互作用建模为一个高度关联的特征网络，这表明该模型捕捉到了潜在的高阶交互作用。虽然基因与环境（或基因与基因）之间的相互作用并没有提高神经网络的乳腺癌风险预测性能，但我们的研究提供了证据，证明这些相互作用可以被这些模型所利用，为其预测提供信息。这项研究代表了神经网络在利用真实世界数据筛选影响乳腺癌风险的相互作用方面的首次应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量