Zixun Zhang , Yuzhe Zhou , Jiayou Zheng , Chunmei Feng , Shuguang Cui , Sheng Wang , Zhen Li
{"title":"Boost Protein Language Model with Injected Structure Information Through Parameter Efficient Fine-tuning","authors":"Zixun Zhang , Yuzhe Zhou , Jiayou Zheng , Chunmei Feng , Shuguang Cui , Sheng Wang , Zhen Li","doi":"10.1016/j.compbiomed.2025.110607","DOIUrl":null,"url":null,"abstract":"<div><div>Large-scale Protein Language Models (PLMs), such as the Evolutionary Scale Modeling (ESM) family, have significantly advanced our understanding of protein structures and functions. These models have shown immense potential in biomedical applications, including drug discovery, protein design, and understanding disease mechanisms at the molecular level. However, PLMs are typically pre-trained on residue sequences alone, with limited incorporation of structural information, presenting opportunities for further enhancement. In this paper, we propose Structure Information Injecting Tuning (SI-Tuning), a parameter-efficient fine-tuning method, to integrate structural information into PLMs. SI-Tuning maintains the original model parameters in a frozen state while optimizing task-specific vectors for input embedding and attention maps. Structural features, including dihedral angles and distance maps, are used to derive this vector, injecting the structural information that improves model performance in downstream tasks. Extensive experiments on 650M ESM-2 demonstrate the effectiveness of our SI-Tuning across multiple downstream tasks. Specifically, our SI-Tuning achieves an accuracy of 93.95% on DeepLoc binary classification, and 76.05% on Metal Ion Binding, outperforming SaProt, a large-scale pre-trained PLM with structural modeling. SI-Tuning effectively enhances the performance of PLMs by incorporating structural information in a parameter-efficient manner. Our method not only advances downstream task performance, but also offers significant computational efficiency, making it a valuable strategy for applying large-scale PLM to broad biomedical downstream applications. Code is available at <span><span>https://github.com/Nocturne0256/SI-tuning</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"195 ","pages":"Article 110607"},"PeriodicalIF":7.0000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525009588","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Large-scale Protein Language Models (PLMs), such as the Evolutionary Scale Modeling (ESM) family, have significantly advanced our understanding of protein structures and functions. These models have shown immense potential in biomedical applications, including drug discovery, protein design, and understanding disease mechanisms at the molecular level. However, PLMs are typically pre-trained on residue sequences alone, with limited incorporation of structural information, presenting opportunities for further enhancement. In this paper, we propose Structure Information Injecting Tuning (SI-Tuning), a parameter-efficient fine-tuning method, to integrate structural information into PLMs. SI-Tuning maintains the original model parameters in a frozen state while optimizing task-specific vectors for input embedding and attention maps. Structural features, including dihedral angles and distance maps, are used to derive this vector, injecting the structural information that improves model performance in downstream tasks. Extensive experiments on 650M ESM-2 demonstrate the effectiveness of our SI-Tuning across multiple downstream tasks. Specifically, our SI-Tuning achieves an accuracy of 93.95% on DeepLoc binary classification, and 76.05% on Metal Ion Binding, outperforming SaProt, a large-scale pre-trained PLM with structural modeling. SI-Tuning effectively enhances the performance of PLMs by incorporating structural information in a parameter-efficient manner. Our method not only advances downstream task performance, but also offers significant computational efficiency, making it a valuable strategy for applying large-scale PLM to broad biomedical downstream applications. Code is available at https://github.com/Nocturne0256/SI-tuning.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.