Prediction of Plant Ubiquitylation Proteins and Sites by Fusing Multiple Features

IF 2.9 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics Pub Date : 2023-09-08 DOI:10.2174/1574893618666230908092847

Meng-Yue Guan, Wang-Ren Qiu, Qian-Kun Wang, Xuan Xiao

{"title":"Prediction of Plant Ubiquitylation Proteins and Sites by Fusing Multiple Features","authors":"Meng-Yue Guan, Wang-Ren Qiu, Qian-Kun Wang, Xuan Xiao","doi":"10.2174/1574893618666230908092847","DOIUrl":null,"url":null,"abstract":"Introduction: Protein ubiquitylation is an important post-translational modification (PTM), which is considered to be one of the most important processes regulating cell function and various diseases. Therefore, accurate prediction of ubiquitylation proteins and their PTM sites is of great significance for the study of basic biological processes and the development of related drugs. Researchers have developed some large-scale computational methods to predict ubiquitylation sites, but there is still much room for improvement. Much of the research related to ubiquitylation is cross-species while the life pattern is diversified, and the prediction method always shows its specificity in practical application. This study just aims at the issue of plants and has constructed computational methods for identifying ubiquitylation protein and ubiquitylation sites. Method: In this work, we constructed two predictive models to identify plant ubiquitylation proteins and sites. First, in the ubiquitylation proteins prediction model, in order to better reflect protein sequence information and obtain better prediction results, the KNN scoring matrix model based on functional domain Gene Ontology (GO) annotation and word embedding model, i.e. Skip-Gram and Continuous Bag of Words (CBOW), are used to extract the features, and the light gradient boosting machine (LGBM) is selected as the ubiquitylation proteins prediction engine. Results: As a result, accuracy (ACC), Precision, recall rate (Recall), F1_score and AUC are respectively 85.12%, 80.96%, 72.80%, 76.37% and 0.9193 in the 10-fold cross-validations on independent dataset. In the ubiquitylation sites prediction model, Skip-Gram, CBOW and enhanced amino acid composition (EAAC) feature extraction codes were used to extract protein sequence fragment features, and the predicted results on training and independent test data have also achieved good performance. Conclusion: In a word, the comparison results demonstrate that our models have a decided advantage in predicting ubiquitylation proteins and sites, and it may provide useful insights for studying the mechanisms and modulation of ubiquitination pathways","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"18 1","pages":"0"},"PeriodicalIF":2.9000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1574893618666230908092847","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 1

Abstract

Introduction: Protein ubiquitylation is an important post-translational modification (PTM), which is considered to be one of the most important processes regulating cell function and various diseases. Therefore, accurate prediction of ubiquitylation proteins and their PTM sites is of great significance for the study of basic biological processes and the development of related drugs. Researchers have developed some large-scale computational methods to predict ubiquitylation sites, but there is still much room for improvement. Much of the research related to ubiquitylation is cross-species while the life pattern is diversified, and the prediction method always shows its specificity in practical application. This study just aims at the issue of plants and has constructed computational methods for identifying ubiquitylation protein and ubiquitylation sites. Method: In this work, we constructed two predictive models to identify plant ubiquitylation proteins and sites. First, in the ubiquitylation proteins prediction model, in order to better reflect protein sequence information and obtain better prediction results, the KNN scoring matrix model based on functional domain Gene Ontology (GO) annotation and word embedding model, i.e. Skip-Gram and Continuous Bag of Words (CBOW), are used to extract the features, and the light gradient boosting machine (LGBM) is selected as the ubiquitylation proteins prediction engine. Results: As a result, accuracy (ACC), Precision, recall rate (Recall), F1_score and AUC are respectively 85.12%, 80.96%, 72.80%, 76.37% and 0.9193 in the 10-fold cross-validations on independent dataset. In the ubiquitylation sites prediction model, Skip-Gram, CBOW and enhanced amino acid composition (EAAC) feature extraction codes were used to extract protein sequence fragment features, and the predicted results on training and independent test data have also achieved good performance. Conclusion: In a word, the comparison results demonstrate that our models have a decided advantage in predicting ubiquitylation proteins and sites, and it may provide useful insights for studying the mechanisms and modulation of ubiquitination pathways

查看原文本刊更多论文

融合多特征预测植物泛素化蛋白和位点

蛋白质泛素化是一种重要的翻译后修饰(post-translational modification, PTM)，被认为是调节细胞功能和各种疾病的重要过程之一。因此，准确预测泛素化蛋白及其PTM位点，对于基础生物学过程的研究和相关药物的开发具有重要意义。研究人员已经开发了一些大规模的计算方法来预测泛素化位点，但仍有很大的改进空间。与泛素化相关的研究多为跨物种研究，且生命模式多样，预测方法在实际应用中往往显示出其特殊性。本研究针对植物的问题，构建了识别泛素化蛋白和泛素化位点的计算方法。方法:建立了植物泛素化蛋白和位点的预测模型。首先，在泛素化蛋白预测模型中，为了更好地反映蛋白质序列信息，获得更好的预测结果，采用基于功能域基因本体(GO)标注的KNN评分矩阵模型和基于Skip-Gram和连续词包(CBOW)的词嵌入模型进行特征提取，并选择光梯度增强机(LGBM)作为泛素化蛋白预测引擎。结果:独立数据集上10倍交叉验证的准确率(ACC)、精密度(Precision)、召回率(recall)、F1_score和AUC分别为85.12%、80.96%、72.80%、76.37%和0.9193。在泛素化位点预测模型中，采用Skip-Gram、CBOW和enhanced amino acid composition (EAAC)特征提取代码提取蛋白质序列片段特征，在训练数据和独立测试数据上的预测结果也取得了较好的效果。结论:比较结果表明，我们的模型在预测泛素化蛋白和位点方面具有明显的优势，为研究泛素化途径的机制和调控提供了有益的见解

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.