ResLysEmbed: a ResNet-based framework for succinylated lysine residue prediction using sequence and language model embeddings.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-08-22 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf198
Souvik Ghosh, Md Muhaiminul Islam Nafi, M Saifur Rahman
{"title":"ResLysEmbed: a ResNet-based framework for succinylated lysine residue prediction using sequence and language model embeddings.","authors":"Souvik Ghosh, Md Muhaiminul Islam Nafi, M Saifur Rahman","doi":"10.1093/bioadv/vbaf198","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Lysine (K) succinylation is a crucial post-translational modification involved in cellular homeostasis and metabolism, and has been linked to several diseases in recent research. Despite its emerging importance, current computational methods are limited in performance for predicting succinylation sites.</p><p><strong>Results: </strong>We propose ResLysEmbed, a novel ResNet-based architecture that combines traditional word embeddings with per-residue embeddings from protein language models for succinylation site prediction. We also compared multiple protein language models to identify the most effective one for this task. Additionally, we experimented with several deep learning architectures to find the most suitable one for processing word embedding features and developed three hybrid architectures: ConvLysEmbed, InceptLysEmbed, and ResLysEmbed. Among these, ResLysEmbed achieved superior performance with accuracy, MCC, and F1 scores of 0.81, 0.39, 0.40 and 0.72, 0.44, 0.67 on two independent test sets, outperforming existing methods. Furthermore, we applied shapley additive explanations analysis to interpret the influence of each residue within the 33-length window around the target site on the model's predictions. This analysis helps understand how the sequential position and structural distance of residues from the target site affect their contribution to succinylation prediction.</p><p><strong>Availability: </strong>The implementation details and code are available at https://github.com/Sheldor7701/ResLysEmbed.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf198"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12413228/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Lysine (K) succinylation is a crucial post-translational modification involved in cellular homeostasis and metabolism, and has been linked to several diseases in recent research. Despite its emerging importance, current computational methods are limited in performance for predicting succinylation sites.

Results: We propose ResLysEmbed, a novel ResNet-based architecture that combines traditional word embeddings with per-residue embeddings from protein language models for succinylation site prediction. We also compared multiple protein language models to identify the most effective one for this task. Additionally, we experimented with several deep learning architectures to find the most suitable one for processing word embedding features and developed three hybrid architectures: ConvLysEmbed, InceptLysEmbed, and ResLysEmbed. Among these, ResLysEmbed achieved superior performance with accuracy, MCC, and F1 scores of 0.81, 0.39, 0.40 and 0.72, 0.44, 0.67 on two independent test sets, outperforming existing methods. Furthermore, we applied shapley additive explanations analysis to interpret the influence of each residue within the 33-length window around the target site on the model's predictions. This analysis helps understand how the sequential position and structural distance of residues from the target site affect their contribution to succinylation prediction.

Availability: The implementation details and code are available at https://github.com/Sheldor7701/ResLysEmbed.

Abstract Image

Abstract Image

Abstract Image

ResLysEmbed:一个基于resnet的框架,用于使用序列和语言模型嵌入来预测琥珀酰化赖氨酸残基。
动机:赖氨酸(K)琥珀酰化是参与细胞内稳态和代谢的重要翻译后修饰,并且在最近的研究中与几种疾病有关。尽管其新兴的重要性,目前的计算方法是有限的性能预测琥珀酰化位点。结果:我们提出了一种基于resnet的新架构ResLysEmbed,它将传统的词嵌入与来自蛋白质语言模型的每残基嵌入相结合,用于琥珀酰化位点预测。我们还比较了多种蛋白质语言模型,以确定最有效的一种。此外,我们尝试了几种深度学习架构,以找到最适合处理词嵌入特征的架构,并开发了三种混合架构:ConvLysEmbed, InceptLysEmbed和ResLysEmbed。其中,ResLysEmbed在两个独立测试集上的准确率、MCC和F1得分分别为0.81、0.39、0.40和0.72、0.44、0.67,优于现有方法。此外,我们应用shapley加性解释分析来解释目标位点周围33个长度窗口内每个残基对模型预测的影响。这一分析有助于理解残基与目标位点的顺序位置和结构距离如何影响它们对琥珀酰化预测的贡献。可用性:实现细节和代码可在https://github.com/Sheldor7701/ResLysEmbed上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信