{"title":"Methyl-GP: accurate generic DNA methylation prediction based on a language model and representation learning.","authors":"Hao Xie, Leyao Wang, Yuqing Qian, Yijie Ding, Fei Guo","doi":"10.1093/nar/gkaf223","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate prediction of DNA methylation remains a challenge. Identifying DNA methylation is important for understanding its functions and elucidating its role in gene regulation mechanisms. In this study, we propose Methyl-GP, a general predictor that accurately predicts three types of DNA methylation from DNA sequences. We found that the conservation of sequence patterns among different species contributes to enhancing the generalizability of the model. By fine-tuning a language model on a dataset comprising multiple species with similar sequence patterns and employing a fusion module to integrate embeddings into a high-quality comprehensive representation, Methyl-GP demonstrates satisfactory predictive performance in methylation identification. Experiments on 17 benchmark datasets for three types of DNA methylation (4mC, 5hmC, and 6mA) demonstrate the superiority of Methyl-GP over existing predictors. Furthermore, by utilizing the attention mechanism, we have visualized the sequence patterns learned by the model, which may help us to gain a deeper understanding of methylation patterns across various species.</p>","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"53 6","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11952970/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf223","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate prediction of DNA methylation remains a challenge. Identifying DNA methylation is important for understanding its functions and elucidating its role in gene regulation mechanisms. In this study, we propose Methyl-GP, a general predictor that accurately predicts three types of DNA methylation from DNA sequences. We found that the conservation of sequence patterns among different species contributes to enhancing the generalizability of the model. By fine-tuning a language model on a dataset comprising multiple species with similar sequence patterns and employing a fusion module to integrate embeddings into a high-quality comprehensive representation, Methyl-GP demonstrates satisfactory predictive performance in methylation identification. Experiments on 17 benchmark datasets for three types of DNA methylation (4mC, 5hmC, and 6mA) demonstrate the superiority of Methyl-GP over existing predictors. Furthermore, by utilizing the attention mechanism, we have visualized the sequence patterns learned by the model, which may help us to gain a deeper understanding of methylation patterns across various species.
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.