{"title":"SEGT-GO:一种基于PPI序列化和解释性人工智能的蛋白质功能预测图转换方法。","authors":"Yansong Wang, Yundong Sun, Baohui Lin, Haotian Zhang, Xiaoling Luo, Yumeng Liu, Xiaopeng Jin, Dongjie Zhu","doi":"10.1186/s12859-025-06059-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>A massive amount of protein sequences have been obtained, but their functions remain challenging to discern. In recent research on protein function prediction, Protein-Protein Interaction (PPI) Networks have played a crucial role. Uncovering potential function relationships between distant proteins within PPI networks is essential for improving the accuracy of protein function prediction. Most current studies attempt to capture these distant relationships by stacking graph network layers, but performance gains diminish as the number of layers increases.</p><p><strong>Results: </strong>To further explore the potential functional relationships between multi-hop proteins in PPI networks, this paper proposes SEGT-GO, a Graph Transformer method based on PPI multi-hop neighborhood Serialization and Explainable artificial intelligence for large-scale multispecies protein function prediction. The multi-hop neighborhood serialization maps multi-hop information in the PPI Network into serialized feature embeddings, enabling the Graph Transformer to learn deeper functional features within the PPI Network. Based on game theory, the SHAP eXplainable Artificial Intelligence (XAI) framework optimizes model input and filters out feature noise, enhancing model performance.</p><p><strong>Conclusions: </strong>Compared to the advanced network method DeepGraphGO, SEGT-GO achieves more competitive results in standard large-scale datasets and superior results on small ones, validating its ability to extract functional information from deep proteins. Furthermore, SEGT-GO achieves superior results in cross-species learning and prediction of the functions of unseen proteins, further proving the method's strong generalization.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"46"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11808960/pdf/","citationCount":"0","resultStr":"{\"title\":\"SEGT-GO: a graph transformer method based on PPI serialization and explanatory artificial intelligence for protein function prediction.\",\"authors\":\"Yansong Wang, Yundong Sun, Baohui Lin, Haotian Zhang, Xiaoling Luo, Yumeng Liu, Xiaopeng Jin, Dongjie Zhu\",\"doi\":\"10.1186/s12859-025-06059-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>A massive amount of protein sequences have been obtained, but their functions remain challenging to discern. In recent research on protein function prediction, Protein-Protein Interaction (PPI) Networks have played a crucial role. Uncovering potential function relationships between distant proteins within PPI networks is essential for improving the accuracy of protein function prediction. Most current studies attempt to capture these distant relationships by stacking graph network layers, but performance gains diminish as the number of layers increases.</p><p><strong>Results: </strong>To further explore the potential functional relationships between multi-hop proteins in PPI networks, this paper proposes SEGT-GO, a Graph Transformer method based on PPI multi-hop neighborhood Serialization and Explainable artificial intelligence for large-scale multispecies protein function prediction. The multi-hop neighborhood serialization maps multi-hop information in the PPI Network into serialized feature embeddings, enabling the Graph Transformer to learn deeper functional features within the PPI Network. Based on game theory, the SHAP eXplainable Artificial Intelligence (XAI) framework optimizes model input and filters out feature noise, enhancing model performance.</p><p><strong>Conclusions: </strong>Compared to the advanced network method DeepGraphGO, SEGT-GO achieves more competitive results in standard large-scale datasets and superior results on small ones, validating its ability to extract functional information from deep proteins. Furthermore, SEGT-GO achieves superior results in cross-species learning and prediction of the functions of unseen proteins, further proving the method's strong generalization.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"46\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11808960/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06059-7\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06059-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
背景:已经获得了大量的蛋白质序列,但它们的功能仍然具有挑战性。在蛋白质功能预测的最新研究中,蛋白质-蛋白质相互作用网络(protein - protein Interaction, PPI)起着至关重要的作用。揭示PPI网络中远处蛋白之间的潜在功能关系对于提高蛋白质功能预测的准确性至关重要。目前的大多数研究都试图通过堆叠图网络层来捕获这些遥远的关系,但是随着层数的增加,性能的提高会减少。结果:为了进一步探索PPI网络中多跳蛋白之间潜在的功能关系,本文提出了一种基于PPI多跳邻域序列化和可解释人工智能的Graph Transformer方法SEGT-GO,用于大规模多物种蛋白功能预测。多跳邻居序列化将PPI网络中的多跳信息映射到序列化的特征嵌入中,使Graph Transformer能够在PPI网络中学习更深层次的功能特征。基于博弈论,SHAP可解释人工智能(XAI)框架优化模型输入,滤除特征噪声,提高模型性能。结论:与先进的网络方法DeepGraphGO相比,SEGT-GO在标准的大规模数据集上取得了更有竞争力的结果,在小型数据集上取得了更优的结果,验证了其从深层蛋白质中提取功能信息的能力。此外,SEGT-GO在跨物种学习和未知蛋白质功能预测方面取得了优异的成绩,进一步证明了该方法的强泛化性。
SEGT-GO: a graph transformer method based on PPI serialization and explanatory artificial intelligence for protein function prediction.
Background: A massive amount of protein sequences have been obtained, but their functions remain challenging to discern. In recent research on protein function prediction, Protein-Protein Interaction (PPI) Networks have played a crucial role. Uncovering potential function relationships between distant proteins within PPI networks is essential for improving the accuracy of protein function prediction. Most current studies attempt to capture these distant relationships by stacking graph network layers, but performance gains diminish as the number of layers increases.
Results: To further explore the potential functional relationships between multi-hop proteins in PPI networks, this paper proposes SEGT-GO, a Graph Transformer method based on PPI multi-hop neighborhood Serialization and Explainable artificial intelligence for large-scale multispecies protein function prediction. The multi-hop neighborhood serialization maps multi-hop information in the PPI Network into serialized feature embeddings, enabling the Graph Transformer to learn deeper functional features within the PPI Network. Based on game theory, the SHAP eXplainable Artificial Intelligence (XAI) framework optimizes model input and filters out feature noise, enhancing model performance.
Conclusions: Compared to the advanced network method DeepGraphGO, SEGT-GO achieves more competitive results in standard large-scale datasets and superior results on small ones, validating its ability to extract functional information from deep proteins. Furthermore, SEGT-GO achieves superior results in cross-species learning and prediction of the functions of unseen proteins, further proving the method's strong generalization.
期刊介绍:
BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology.
BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.