Rprot-Vec:一种快速蛋白质结构相似度计算的深度学习方法。

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Yichuan Zhang, Wen Zhang
{"title":"Rprot-Vec:一种快速蛋白质结构相似度计算的深度学习方法。","authors":"Yichuan Zhang, Wen Zhang","doi":"10.1186/s12859-025-06213-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Predicting protein structural similarity and detecting homologous sequences remain fundamental and challenging tasks in computational biology. Accurate identification of structural homologs enables function inference for newly discovered or unannotated proteins. Traditional approaches often require full 3D structural data, which is unavailable for most proteins. Thus, there is a need for sequence-based methods capable of inferring structural similarity efficiently and at scale.</p><p><strong>Result: </strong>We present Rprot-Vec (Rapid Protein Vector), a deep learning model that predicts protein structural similarity and performs homology detection using only primary sequence data. The model integrates bidirectional GRU and multi-scale CNN layers with ProtT5-based encoding, enabling accurate and fast similarity estimation. Rprot-Vec achieves a 65.3% accurate similarity prediction rate in the homologous region (TM-score > 0.8), with an average prediction error of 0.0561 across all TM-score intervals. Despite having only 41% of the parameters of TM-vec, Rprot-Vec outperforms both public and locally trained TM-vec baselines in all tested settings. Additionally, we constructed and released three curated training datasets (CATH_TM_score_S/M/L), supporting further research in this area.</p><p><strong>Conclusion: </strong>Rprot-Vec offers a fast and lightweight solution for sequence-based structural similarity prediction. It can be applied in protein homology detection, structure-function inference, drug repurposing, and other downstream biological tasks. Its open-source availability and released datasets facilitate broader adoption and further development by the research community.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"171"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12243341/pdf/","citationCount":"0","resultStr":"{\"title\":\"Rprot-Vec: a deep learning approach for fast protein structure similarity calculation.\",\"authors\":\"Yichuan Zhang, Wen Zhang\",\"doi\":\"10.1186/s12859-025-06213-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Predicting protein structural similarity and detecting homologous sequences remain fundamental and challenging tasks in computational biology. Accurate identification of structural homologs enables function inference for newly discovered or unannotated proteins. Traditional approaches often require full 3D structural data, which is unavailable for most proteins. Thus, there is a need for sequence-based methods capable of inferring structural similarity efficiently and at scale.</p><p><strong>Result: </strong>We present Rprot-Vec (Rapid Protein Vector), a deep learning model that predicts protein structural similarity and performs homology detection using only primary sequence data. The model integrates bidirectional GRU and multi-scale CNN layers with ProtT5-based encoding, enabling accurate and fast similarity estimation. Rprot-Vec achieves a 65.3% accurate similarity prediction rate in the homologous region (TM-score > 0.8), with an average prediction error of 0.0561 across all TM-score intervals. Despite having only 41% of the parameters of TM-vec, Rprot-Vec outperforms both public and locally trained TM-vec baselines in all tested settings. Additionally, we constructed and released three curated training datasets (CATH_TM_score_S/M/L), supporting further research in this area.</p><p><strong>Conclusion: </strong>Rprot-Vec offers a fast and lightweight solution for sequence-based structural similarity prediction. It can be applied in protein homology detection, structure-function inference, drug repurposing, and other downstream biological tasks. Its open-source availability and released datasets facilitate broader adoption and further development by the research community.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"171\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12243341/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06213-1\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06213-1","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

背景:预测蛋白质结构相似性和检测同源序列是计算生物学的基础和具有挑战性的任务。结构同源物的准确鉴定使新发现或未注释的蛋白质的功能推断成为可能。传统的方法通常需要完整的3D结构数据,这对于大多数蛋白质来说是不可用的。因此,需要基于序列的方法,能够有效和大规模地推断结构相似性。结果:我们提出了Rprot-Vec(快速蛋白质载体),这是一种深度学习模型,可以预测蛋白质结构相似性并仅使用初级序列数据进行同源性检测。该模型将双向GRU和多尺度CNN层与基于prott5的编码相结合,实现了准确快速的相似度估计。Rprot-Vec在同源区(TM-score >.8)的相似度预测准确率为65.3%,在所有TM-score区间的平均预测误差为0.0561。尽管Rprot-Vec的参数只有TM-vec的41%,但在所有测试环境中,Rprot-Vec都优于公共和本地训练的TM-vec基线。此外,我们构建并发布了三个精心策划的训练数据集(CATH_TM_score_S/M/L),为该领域的进一步研究提供支持。结论:Rprot-Vec为基于序列的结构相似性预测提供了快速、轻量级的解决方案。它可以应用于蛋白质同源性检测、结构功能推断、药物再利用和其他下游生物任务。它的开源可用性和发布的数据集促进了研究社区更广泛的采用和进一步发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Rprot-Vec: a deep learning approach for fast protein structure similarity calculation.

Rprot-Vec: a deep learning approach for fast protein structure similarity calculation.

Rprot-Vec: a deep learning approach for fast protein structure similarity calculation.

Rprot-Vec: a deep learning approach for fast protein structure similarity calculation.

Background: Predicting protein structural similarity and detecting homologous sequences remain fundamental and challenging tasks in computational biology. Accurate identification of structural homologs enables function inference for newly discovered or unannotated proteins. Traditional approaches often require full 3D structural data, which is unavailable for most proteins. Thus, there is a need for sequence-based methods capable of inferring structural similarity efficiently and at scale.

Result: We present Rprot-Vec (Rapid Protein Vector), a deep learning model that predicts protein structural similarity and performs homology detection using only primary sequence data. The model integrates bidirectional GRU and multi-scale CNN layers with ProtT5-based encoding, enabling accurate and fast similarity estimation. Rprot-Vec achieves a 65.3% accurate similarity prediction rate in the homologous region (TM-score > 0.8), with an average prediction error of 0.0561 across all TM-score intervals. Despite having only 41% of the parameters of TM-vec, Rprot-Vec outperforms both public and locally trained TM-vec baselines in all tested settings. Additionally, we constructed and released three curated training datasets (CATH_TM_score_S/M/L), supporting further research in this area.

Conclusion: Rprot-Vec offers a fast and lightweight solution for sequence-based structural similarity prediction. It can be applied in protein homology detection, structure-function inference, drug repurposing, and other downstream biological tasks. Its open-source availability and released datasets facilitate broader adoption and further development by the research community.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信