Hybrid representation learning for human m6A modifications with chromosome-level generalizability.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-07-14 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf170
Muhammad Tahir, Sheela Ramanna, Qian Liu
{"title":"Hybrid representation learning for human m<sup>6</sup>A modifications with chromosome-level generalizability.","authors":"Muhammad Tahir, Sheela Ramanna, Qian Liu","doi":"10.1093/bioadv/vbaf170","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong><math> <mrow> <mrow> <msup><mrow><mi>N</mi></mrow> <mn>6</mn></msup> </mrow> <mo>-</mo> <mtext>methyladenosine</mtext></mrow> </math> ( <math> <mrow> <mrow> <msup><mrow><mi>m</mi></mrow> <mn>6</mn></msup> </mrow> <mi>A</mi></mrow> </math> ) is the most abundant internal modification in eukaryotic mRNA and plays essential roles in post-transcriptional gene regulation. While several deep learning approaches have been proposed to predict <math> <mrow> <mrow> <msup><mrow><mi>m</mi></mrow> <mn>6</mn></msup> </mrow> <mi>A</mi></mrow> </math> sites, most suffer from limited chromosome-level generalizability due to evaluation on randomly split datasets.</p><p><strong>Results: </strong>In this study, we propose two novel hybrid deep learning models-Hybrid Model and Hybrid Deep Model-that integrate local sequence features (<i>k</i>-mers) and contextual embeddings via convolutional neural networks to improve predictive performance and generalization. We evaluate these models using both a Random-Split strategy and a more biologically realistic Leave-One-Chromosome-Out setting to ensure robustness across genomic regions. Our proposed models outperform the state-of-the-art m6A-TCPred model across all key evaluation metrics. Hybrid Deep Model achieves the highest accuracy under Random-Split, while Hybrid Model demonstrates superior generalization under Leave-One-Chromosome-Out, indicating that deep global representations may overfit in chromosome-independent settings. These findings underscore the importance of rigorous validation strategies and offer insights into designing robust <math> <mrow> <mrow> <msup><mrow><mi>m</mi></mrow> <mn>6</mn></msup> </mrow> <mi>A</mi></mrow> </math> predictors.</p><p><strong>Availability and implementation: </strong>Source code and datasets are available at: https://github.com/malikmtahir/LOCO-m6A.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf170"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288952/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: N 6 - methyladenosine ( m 6 A ) is the most abundant internal modification in eukaryotic mRNA and plays essential roles in post-transcriptional gene regulation. While several deep learning approaches have been proposed to predict m 6 A sites, most suffer from limited chromosome-level generalizability due to evaluation on randomly split datasets.

Results: In this study, we propose two novel hybrid deep learning models-Hybrid Model and Hybrid Deep Model-that integrate local sequence features (k-mers) and contextual embeddings via convolutional neural networks to improve predictive performance and generalization. We evaluate these models using both a Random-Split strategy and a more biologically realistic Leave-One-Chromosome-Out setting to ensure robustness across genomic regions. Our proposed models outperform the state-of-the-art m6A-TCPred model across all key evaluation metrics. Hybrid Deep Model achieves the highest accuracy under Random-Split, while Hybrid Model demonstrates superior generalization under Leave-One-Chromosome-Out, indicating that deep global representations may overfit in chromosome-independent settings. These findings underscore the importance of rigorous validation strategies and offer insights into designing robust m 6 A predictors.

Availability and implementation: Source code and datasets are available at: https://github.com/malikmtahir/LOCO-m6A.

Abstract Image

Abstract Image

Abstract Image

具有染色体水平通用性的人类m6A修饰的混合表示学习。
动机:n6 -甲基腺苷(n6 - methylladenosine, m6 A)是真核生物mRNA中最丰富的内部修饰,在转录后基因调控中起重要作用。虽然已经提出了几种深度学习方法来预测m6a位点,但由于对随机分割的数据集进行评估,大多数方法在染色体水平上的泛化性有限。结果:在这项研究中,我们提出了两种新的混合深度学习模型——混合模型和混合深度模型,它们通过卷积神经网络集成了局部序列特征(k-mers)和上下文嵌入,以提高预测性能和泛化。我们使用随机分裂策略和更符合生物学现实的留一条染色体设置来评估这些模型,以确保跨基因组区域的稳健性。我们提出的模型在所有关键评估指标上都优于最先进的m6A-TCPred模型。Hybrid深度模型在Random-Split下达到最高的精度,而Hybrid模型在leave - one - chromosomeout下表现出更好的泛化,这表明深度全局表示在染色体无关的情况下可能会过拟合。这些发现强调了严格验证策略的重要性,并为设计稳健的m6a预测器提供了见解。可用性和实现:源代码和数据集可在:https://github.com/malikmtahir/LOCO-m6A。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信