Methyl-GP: accurate generic DNA methylation prediction based on a language model and representation learning.

IF 16.6 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Hao Xie, Leyao Wang, Yuqing Qian, Yijie Ding, Fei Guo
{"title":"Methyl-GP: accurate generic DNA methylation prediction based on a language model and representation learning.","authors":"Hao Xie, Leyao Wang, Yuqing Qian, Yijie Ding, Fei Guo","doi":"10.1093/nar/gkaf223","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate prediction of DNA methylation remains a challenge. Identifying DNA methylation is important for understanding its functions and elucidating its role in gene regulation mechanisms. In this study, we propose Methyl-GP, a general predictor that accurately predicts three types of DNA methylation from DNA sequences. We found that the conservation of sequence patterns among different species contributes to enhancing the generalizability of the model. By fine-tuning a language model on a dataset comprising multiple species with similar sequence patterns and employing a fusion module to integrate embeddings into a high-quality comprehensive representation, Methyl-GP demonstrates satisfactory predictive performance in methylation identification. Experiments on 17 benchmark datasets for three types of DNA methylation (4mC, 5hmC, and 6mA) demonstrate the superiority of Methyl-GP over existing predictors. Furthermore, by utilizing the attention mechanism, we have visualized the sequence patterns learned by the model, which may help us to gain a deeper understanding of methylation patterns across various species.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"53 6","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11952970/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf223","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate prediction of DNA methylation remains a challenge. Identifying DNA methylation is important for understanding its functions and elucidating its role in gene regulation mechanisms. In this study, we propose Methyl-GP, a general predictor that accurately predicts three types of DNA methylation from DNA sequences. We found that the conservation of sequence patterns among different species contributes to enhancing the generalizability of the model. By fine-tuning a language model on a dataset comprising multiple species with similar sequence patterns and employing a fusion module to integrate embeddings into a high-quality comprehensive representation, Methyl-GP demonstrates satisfactory predictive performance in methylation identification. Experiments on 17 benchmark datasets for three types of DNA methylation (4mC, 5hmC, and 6mA) demonstrate the superiority of Methyl-GP over existing predictors. Furthermore, by utilizing the attention mechanism, we have visualized the sequence patterns learned by the model, which may help us to gain a deeper understanding of methylation patterns across various species.

甲基化gp:基于语言模型和表征学习的准确的通用DNA甲基化预测。
准确预测DNA甲基化仍然是一个挑战。鉴定DNA甲基化对于理解其功能和阐明其在基因调控机制中的作用具有重要意义。在这项研究中,我们提出了甲基gp,一个通用的预测器,准确地预测DNA序列的三种类型的DNA甲基化。研究发现,不同物种间序列模式的保守性有助于提高模型的通用性。通过对包含多个物种序列模式相似的数据集的语言模型进行微调,并使用融合模块将嵌入集成到高质量的综合表示中,甲基化gp在甲基化识别方面显示出令人满意的预测性能。对3种DNA甲基化类型(4mC、5hmC和6mA)的17个基准数据集进行的实验表明,甲基化gp比现有预测因子具有优越性。此外,通过利用注意机制,我们将模型学习到的序列模式可视化,这可能有助于我们更深入地了解不同物种的甲基化模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nucleic Acids Research
Nucleic Acids Research 生物-生化与分子生物学
CiteScore
27.10
自引率
4.70%
发文量
1057
审稿时长
2 months
期刊介绍: Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信