{"title":"DYNA: Disease-Specific Language Model for Variant Pathogenicity","authors":"Huixin Zhan, Zijun Zhang","doi":"arxiv-2406.00164","DOIUrl":null,"url":null,"abstract":"Clinical variant classification of pathogenic versus benign genetic variants\nremains a challenge in clinical genetics. Recently, the proposition of genomic\nfoundation models has improved the generic variant effect prediction (VEP)\naccuracy via weakly-supervised or unsupervised training. However, these VEPs\nare not disease-specific, limiting their adaptation at the point of care. To\naddress this problem, we propose DYNA: Disease-specificity fine-tuning via a\nSiamese neural network broadly applicable to all genomic foundation models for\nmore effective variant effect predictions in disease-specific contexts. We\nevaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus\non various cardiovascular diseases, where gene-disease relationships of\nloss-of-function vs. gain-of-function dictate disease-specific VEP. For\nnon-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory\naxis of RNA splicing, the most common non-coding pathogenic mechanism in\nestablished clinical VEP guidelines. In both cases, DYNA fine-tunes various\npre-trained genomic foundation models on small, rare variant sets. The DYNA\nfine-tuned models show superior performance in the held-out rare variant\ntesting set and are further replicated in large, clinically-relevant variant\nannotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant\neffect prediction method, excelling in intra-gene generalization and\ngeneralization to unseen genetic variants, making it particularly valuable for\ndisease associations and clinical applicability.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.00164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Clinical variant classification of pathogenic versus benign genetic variants
remains a challenge in clinical genetics. Recently, the proposition of genomic
foundation models has improved the generic variant effect prediction (VEP)
accuracy via weakly-supervised or unsupervised training. However, these VEPs
are not disease-specific, limiting their adaptation at the point of care. To
address this problem, we propose DYNA: Disease-specificity fine-tuning via a
Siamese neural network broadly applicable to all genomic foundation models for
more effective variant effect predictions in disease-specific contexts. We
evaluate DYNA in two distinct disease-relevant tasks. For coding VEPs, we focus
on various cardiovascular diseases, where gene-disease relationships of
loss-of-function vs. gain-of-function dictate disease-specific VEP. For
non-coding VEPs, we apply DYNA to an essential post-transcriptional regulatory
axis of RNA splicing, the most common non-coding pathogenic mechanism in
established clinical VEP guidelines. In both cases, DYNA fine-tunes various
pre-trained genomic foundation models on small, rare variant sets. The DYNA
fine-tuned models show superior performance in the held-out rare variant
testing set and are further replicated in large, clinically-relevant variant
annotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant
effect prediction method, excelling in intra-gene generalization and
generalization to unseen genetic variants, making it particularly valuable for
disease associations and clinical applicability.