ResLysEmbed: a ResNet-based framework for succinylated lysine residue prediction using sequence and language model embeddings.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2025-08-22 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf198

Souvik Ghosh, Md Muhaiminul Islam Nafi, M Saifur Rahman

{"title":"ResLysEmbed: a ResNet-based framework for succinylated lysine residue prediction using sequence and language model embeddings.","authors":"Souvik Ghosh, Md Muhaiminul Islam Nafi, M Saifur Rahman","doi":"10.1093/bioadv/vbaf198","DOIUrl":null,"url":null,"abstract":"Motivation: Lysine (K) succinylation is a crucial post-translational modification involved in cellular homeostasis and metabolism, and has been linked to several diseases in recent research. Despite its emerging importance, current computational methods are limited in performance for predicting succinylation sites.Results: We propose ResLysEmbed, a novel ResNet-based architecture that combines traditional word embeddings with per-residue embeddings from protein language models for succinylation site prediction. We also compared multiple protein language models to identify the most effective one for this task. Additionally, we experimented with several deep learning architectures to find the most suitable one for processing word embedding features and developed three hybrid architectures: ConvLysEmbed, InceptLysEmbed, and ResLysEmbed. Among these, ResLysEmbed achieved superior performance with accuracy, MCC, and F1 scores of 0.81, 0.39, 0.40 and 0.72, 0.44, 0.67 on two independent test sets, outperforming existing methods. Furthermore, we applied shapley additive explanations analysis to interpret the influence of each residue within the 33-length window around the target site on the model's predictions. This analysis helps understand how the sequential position and structural distance of residues from the target site affect their contribution to succinylation prediction.Availability: The implementation details and code are available at https://github.com/Sheldor7701/ResLysEmbed.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf198"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12413228/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: Lysine (K) succinylation is a crucial post-translational modification involved in cellular homeostasis and metabolism, and has been linked to several diseases in recent research. Despite its emerging importance, current computational methods are limited in performance for predicting succinylation sites.

Results: We propose ResLysEmbed, a novel ResNet-based architecture that combines traditional word embeddings with per-residue embeddings from protein language models for succinylation site prediction. We also compared multiple protein language models to identify the most effective one for this task. Additionally, we experimented with several deep learning architectures to find the most suitable one for processing word embedding features and developed three hybrid architectures: ConvLysEmbed, InceptLysEmbed, and ResLysEmbed. Among these, ResLysEmbed achieved superior performance with accuracy, MCC, and F1 scores of 0.81, 0.39, 0.40 and 0.72, 0.44, 0.67 on two independent test sets, outperforming existing methods. Furthermore, we applied shapley additive explanations analysis to interpret the influence of each residue within the 33-length window around the target site on the model's predictions. This analysis helps understand how the sequential position and structural distance of residues from the target site affect their contribution to succinylation prediction.

Availability: The implementation details and code are available at https://github.com/Sheldor7701/ResLysEmbed.

Abstract Image

查看原文本刊更多论文

ResLysEmbed：一个基于resnet的框架，用于使用序列和语言模型嵌入来预测琥珀酰化赖氨酸残基。

动机：赖氨酸(K)琥珀酰化是参与细胞内稳态和代谢的重要翻译后修饰，并且在最近的研究中与几种疾病有关。尽管其新兴的重要性，目前的计算方法是有限的性能预测琥珀酰化位点。结果：我们提出了一种基于resnet的新架构ResLysEmbed，它将传统的词嵌入与来自蛋白质语言模型的每残基嵌入相结合，用于琥珀酰化位点预测。我们还比较了多种蛋白质语言模型，以确定最有效的一种。此外，我们尝试了几种深度学习架构，以找到最适合处理词嵌入特征的架构，并开发了三种混合架构：ConvLysEmbed， InceptLysEmbed和ResLysEmbed。其中，ResLysEmbed在两个独立测试集上的准确率、MCC和F1得分分别为0.81、0.39、0.40和0.72、0.44、0.67，优于现有方法。此外，我们应用shapley加性解释分析来解释目标位点周围33个长度窗口内每个残基对模型预测的影响。这一分析有助于理解残基与目标位点的顺序位置和结构距离如何影响它们对琥珀酰化预测的贡献。可用性：实现细节和代码可在https://github.com/Sheldor7701/ResLysEmbed上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量