DeepAnnotation：一种新的可解释的基于深度学习的基因组选择模型，集成了全面的功能注释。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2025-01-06 DOI:10.1093/gigascience/giaf083

Wenlong Ma, Weigang Zheng, Shenghua Qin, Chao Wang, Bowen Lei, Yuwen Liu

{"title":"DeepAnnotation：一种新的可解释的基于深度学习的基因组选择模型，集成了全面的功能注释。","authors":"Wenlong Ma, Weigang Zheng, Shenghua Qin, Chao Wang, Bowen Lei, Yuwen Liu","doi":"10.1093/gigascience/giaf083","DOIUrl":null,"url":null,"abstract":"Background: Genomic selection, which leverages genomic information to predict the breeding value of individuals, has dramatically accelerated the improvement of economically important traits. The growing availability of multiomics data in agricultural species offers an unprecedented opportunity to enrich this process with prior biological knowledge. However, fully harnessing these rich data sources for accurate phenotype prediction in genomic selection remains in its early stages.Results: In this study, we present DeepAnnotation, a novel interpretable genomic selection model designed for phenotype prediction by integrating comprehensive multiomics functional annotations using deep learning. To capture the complex information flow from genotype to phenotype, DeepAnnotation aligns multiomics biological annotations with sequential network layers in a deep learning architecture, mirroring the natural regulatory cascade from genotype to intermediate molecular phenotypes-such as cis-regulatory elements, genes, and gene modules-and ultimately to phenotypes of economic traits. Comparing against 7 classical models (rrBLUP, LightGBM, KAML, BLUP, BayesR, MBLUP, and BayesRC), DeepAnnotation demonstrated significantly superior prediction accuracy (Pearson correlation coefficient increased by 6.4% to 120.0%) and computational efficiency for 3 pork production traits (lean meat percentage, loin muscle depth, and back fat thickness) using a dataset of 1,700 training Duroc boars and 240 independent validation individuals, each genotyped for 11,633,164 single-nucleotide polymorphisms (SNPs), particularly in identifying top-performing individuals. Furthermore, the interpretability embedded within our framework enables the identification of potential causal SNPs and the exploration of their mediated molecular mechanisms underlying trait variation.Conclusions: DeepAnnotation is an open-source, interpretable deep learning approach for phenotype prediction, leveraging comprehensive multiomics functional annotations. Freely accessible via GitHub and Docker, it provides a valuable tool for researchers and practitioners in genomic selection.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12392413/pdf/","citationCount":"0","resultStr":"{\"title\":\"DeepAnnotation: A novel interpretable deep learning-based genomic selection model that integrates comprehensive functional annotations.\",\"authors\":\"Wenlong Ma, Weigang Zheng, Shenghua Qin, Chao Wang, Bowen Lei, Yuwen Liu\",\"doi\":\"10.1093/gigascience/giaf083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Genomic selection, which leverages genomic information to predict the breeding value of individuals, has dramatically accelerated the improvement of economically important traits. The growing availability of multiomics data in agricultural species offers an unprecedented opportunity to enrich this process with prior biological knowledge. However, fully harnessing these rich data sources for accurate phenotype prediction in genomic selection remains in its early stages.Results: In this study, we present DeepAnnotation, a novel interpretable genomic selection model designed for phenotype prediction by integrating comprehensive multiomics functional annotations using deep learning. To capture the complex information flow from genotype to phenotype, DeepAnnotation aligns multiomics biological annotations with sequential network layers in a deep learning architecture, mirroring the natural regulatory cascade from genotype to intermediate molecular phenotypes-such as cis-regulatory elements, genes, and gene modules-and ultimately to phenotypes of economic traits. Comparing against 7 classical models (rrBLUP, LightGBM, KAML, BLUP, BayesR, MBLUP, and BayesRC), DeepAnnotation demonstrated significantly superior prediction accuracy (Pearson correlation coefficient increased by 6.4% to 120.0%) and computational efficiency for 3 pork production traits (lean meat percentage, loin muscle depth, and back fat thickness) using a dataset of 1,700 training Duroc boars and 240 independent validation individuals, each genotyped for 11,633,164 single-nucleotide polymorphisms (SNPs), particularly in identifying top-performing individuals. Furthermore, the interpretability embedded within our framework enables the identification of potential causal SNPs and the exploration of their mediated molecular mechanisms underlying trait variation.Conclusions: DeepAnnotation is an open-source, interpretable deep learning approach for phenotype prediction, leveraging comprehensive multiomics functional annotations. Freely accessible via GitHub and Docker, it provides a valuable tool for researchers and practitioners in genomic selection.\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\"14 \",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12392413/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giaf083\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf083","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

背景：基因组选择利用基因组信息预测个体的育种价值，极大地促进了重要经济性状的改良。农业物种中越来越多的多组学数据提供了一个前所未有的机会，可以用先前的生物学知识来丰富这一过程。然而，在基因组选择中充分利用这些丰富的数据源进行准确的表型预测仍处于早期阶段。结果：在这项研究中，我们提出了DeepAnnotation，这是一种新的可解释的基因组选择模型，旨在通过深度学习集成综合多组学功能注释来预测表型。为了捕获从基因型到表型的复杂信息流，DeepAnnotation将多组学生物学注释与深度学习架构中的顺序网络层相结合，反映了从基因型到中间分子表型的自然调控级联-例如顺式调控元件，基因和基因模块-并最终到经济性状的表型。与7个经典模型（rrBLUP、LightGBM、KAML、BLUP、BayesR、MBLUP和BayesRC）相比，DeepAnnotation在1700头训练杜洛克公猪和240头独立验证个体的数据集上，对3个猪肉生产性状（瘦肉率、腰肌深度和背部脂肪厚度）的预测精度（Pearson相关系数提高6.4%至120.0%）和计算效率显著提高。每个基因型有11,633,164个单核苷酸多态性（snp），特别是在识别表现最好的个体时。此外，在我们的框架中嵌入的可解释性使鉴定潜在的因果snp和探索其介导的性状变异的分子机制成为可能。结论：DeepAnnotation是一种开源的、可解释的用于表型预测的深度学习方法，利用了全面的多组学功能注释。通过GitHub和Docker免费访问，它为基因组选择的研究人员和实践者提供了一个有价值的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DeepAnnotation: A novel interpretable deep learning-based genomic selection model that integrates comprehensive functional annotations.

Background: Genomic selection, which leverages genomic information to predict the breeding value of individuals, has dramatically accelerated the improvement of economically important traits. The growing availability of multiomics data in agricultural species offers an unprecedented opportunity to enrich this process with prior biological knowledge. However, fully harnessing these rich data sources for accurate phenotype prediction in genomic selection remains in its early stages.

Results: In this study, we present DeepAnnotation, a novel interpretable genomic selection model designed for phenotype prediction by integrating comprehensive multiomics functional annotations using deep learning. To capture the complex information flow from genotype to phenotype, DeepAnnotation aligns multiomics biological annotations with sequential network layers in a deep learning architecture, mirroring the natural regulatory cascade from genotype to intermediate molecular phenotypes-such as cis-regulatory elements, genes, and gene modules-and ultimately to phenotypes of economic traits. Comparing against 7 classical models (rrBLUP, LightGBM, KAML, BLUP, BayesR, MBLUP, and BayesRC), DeepAnnotation demonstrated significantly superior prediction accuracy (Pearson correlation coefficient increased by 6.4% to 120.0%) and computational efficiency for 3 pork production traits (lean meat percentage, loin muscle depth, and back fat thickness) using a dataset of 1,700 training Duroc boars and 240 independent validation individuals, each genotyped for 11,633,164 single-nucleotide polymorphisms (SNPs), particularly in identifying top-performing individuals. Furthermore, the interpretability embedded within our framework enables the identification of potential causal SNPs and the exploration of their mediated molecular mechanisms underlying trait variation.

Conclusions: DeepAnnotation is an open-source, interpretable deep learning approach for phenotype prediction, leveraging comprehensive multiomics functional annotations. Freely accessible via GitHub and Docker, it provides a valuable tool for researchers and practitioners in genomic selection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.