预测致病调控变异的深度学习模型的比较分析。

bioRxiv : the preprint server for biology Pub Date : 2025-06-11 DOI:10.1101/2025.05.19.654920

Gaetano Manzo, Kathryn Borkowski, Ivan Ovcharenko

{"title":"预测致病调控变异的深度学习模型的比较分析。","authors":"Gaetano Manzo, Kathryn Borkowski, Ivan Ovcharenko","doi":"10.1101/2025.05.19.654920","DOIUrl":null,"url":null,"abstract":"Motivation: Genome-wide association studies (GWAS) have identified numerous noncoding variants associated with complex human diseases, disorders, and traits. However, resolving the uncertainty between GWAS association and causality remains a significant challenge. The small subset of noncoding GWAS variants with causative effects on gene regulatory elements can only be detected through accurate methods that assess the impact of DNA sequence variation on gene regulatory activity. Deep learning models, such as those based on Convolutional Neural Networks (CNNs) and transformers, have gained prominence in predicting the regulatory effects of genetic variants, particularly in enhancers, by learning patterns from genomic and epigenomic data. Despite their potential, selecting the most suitable model is hindered by the lack of standardized benchmarks, consistent training conditions, and performance evaluation criteria in existing reviews.Results: This study evaluates state-of-the-art deep learning models for predicting the effects of genetic variants on enhancer activity using nine datasets stemming from MPRA, raQTL, and eQTL experiments, profiling the regulatory impact of 54,859 SNPs across four human cell lines. The results reveal that CNN models, such as TREDNet and SEI, consistently outperform other architectures in predicting the regulatory impact of single-nucleotide polymorphisms (SNPs). However, hybrid CNN-transformer models, such as Borzoi, display superior performance in identifying causal SNPs within a linkage disequilibrium block. While fine-tuning enhances the performance of transformer-based models, it remains insufficient to surpass CNN and hybrid models when evaluated under optimized conditions.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12190767/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants.\",\"authors\":\"Gaetano Manzo, Kathryn Borkowski, Ivan Ovcharenko\",\"doi\":\"10.1101/2025.05.19.654920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Genome-wide association studies (GWAS) have identified numerous noncoding variants associated with complex human diseases, disorders, and traits. However, resolving the uncertainty between GWAS association and causality remains a significant challenge. The small subset of noncoding GWAS variants with causative effects on gene regulatory elements can only be detected through accurate methods that assess the impact of DNA sequence variation on gene regulatory activity. Deep learning models, such as those based on Convolutional Neural Networks (CNNs) and transformers, have gained prominence in predicting the regulatory effects of genetic variants, particularly in enhancers, by learning patterns from genomic and epigenomic data. Despite their potential, selecting the most suitable model is hindered by the lack of standardized benchmarks, consistent training conditions, and performance evaluation criteria in existing reviews.Results: This study evaluates state-of-the-art deep learning models for predicting the effects of genetic variants on enhancer activity using nine datasets stemming from MPRA, raQTL, and eQTL experiments, profiling the regulatory impact of 54,859 SNPs across four human cell lines. The results reveal that CNN models, such as TREDNet and SEI, consistently outperform other architectures in predicting the regulatory impact of single-nucleotide polymorphisms (SNPs). However, hybrid CNN-transformer models, such as Borzoi, display superior performance in identifying causal SNPs within a linkage disequilibrium block. While fine-tuning enhances the performance of transformer-based models, it remains insufficient to surpass CNN and hybrid models when evaluated under optimized conditions.\",\"PeriodicalId\":519960,\"journal\":{\"name\":\"bioRxiv : the preprint server for biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12190767/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv : the preprint server for biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2025.05.19.654920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.05.19.654920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

动机：全基因组关联研究（GWAS）已经确定了许多与复杂的人类疾病、失调和性状相关的非编码变异。然而，解决GWAS关联和因果关系之间的不确定性仍然是一个重大挑战。只有通过评估DNA序列变异对基因调控活性影响的准确方法，才能检测到对基因调控元件有致病作用的一小部分非编码GWAS变异。深度学习模型，如基于卷积神经网络（cnn）和转换器的模型，通过从基因组和表观基因组数据中学习模式，在预测遗传变异（特别是增强子）的调控效应方面取得了突出成就。尽管它们具有潜力，但由于缺乏标准化的基准、一致的训练条件和现有审查中的绩效评价标准，阻碍了选择最合适的模型。结果：本研究使用来自MPRA、raQTL和eQTL实验的9个数据集评估了最先进的深度学习模型，用于预测遗传变异对增强子活性的影响，分析了4种人类细胞系中54,859个snp的调控影响。结果表明，CNN模型，如TREDNet和SEI，在预测单核苷酸多态性（snp）的调控影响方面始终优于其他架构。然而，混合CNN-transformer模型，如Borzoi，在识别连锁不平衡块中的因果snp方面表现优异。虽然微调提高了基于变压器的模型的性能，但在优化条件下进行评估时，仍不足以超越CNN和混合模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative Analysis of Deep Learning Models for Predicting Causative Regulatory Variants.

Motivation: Genome-wide association studies (GWAS) have identified numerous noncoding variants associated with complex human diseases, disorders, and traits. However, resolving the uncertainty between GWAS association and causality remains a significant challenge. The small subset of noncoding GWAS variants with causative effects on gene regulatory elements can only be detected through accurate methods that assess the impact of DNA sequence variation on gene regulatory activity. Deep learning models, such as those based on Convolutional Neural Networks (CNNs) and transformers, have gained prominence in predicting the regulatory effects of genetic variants, particularly in enhancers, by learning patterns from genomic and epigenomic data. Despite their potential, selecting the most suitable model is hindered by the lack of standardized benchmarks, consistent training conditions, and performance evaluation criteria in existing reviews.

Results: This study evaluates state-of-the-art deep learning models for predicting the effects of genetic variants on enhancer activity using nine datasets stemming from MPRA, raQTL, and eQTL experiments, profiling the regulatory impact of 54,859 SNPs across four human cell lines. The results reveal that CNN models, such as TREDNet and SEI, consistently outperform other architectures in predicting the regulatory impact of single-nucleotide polymorphisms (SNPs). However, hybrid CNN-transformer models, such as Borzoi, display superior performance in identifying causal SNPs within a linkage disequilibrium block. While fine-tuning enhances the performance of transformer-based models, it remains insufficient to surpass CNN and hybrid models when evaluated under optimized conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

bioRxiv : the preprint server for biology

自引率

0.00%

发文量