Ubi-xgb:利用机器学习模型识别泛素蛋白

Journal of Mountain Area Research Pub Date : 2022-12-02 DOI:10.53874/jmar.v8i0.167

Sikandar Rahu, A. Ghulam, A. Farman, Dhani Bux Talpur, Mir Sajjad Hussain Talpur, Erum Saba, Z. A. Maher, Saima Tunio

{"title":"Ubi-xgb:利用机器学习模型识别泛素蛋白","authors":"Sikandar Rahu, A. Ghulam, A. Farman, Dhani Bux Talpur, Mir Sajjad Hussain Talpur, Erum Saba, Z. A. Maher, Saima Tunio","doi":"10.53874/jmar.v8i0.167","DOIUrl":null,"url":null,"abstract":"A recent line of research has focused on Ubiquitination, a pervasive and proteasome-mediated protein degradation that controls apoptosis and is crucial in the breakdown of proteins and the development of cell disorders, is a major factor. The turnover of proteins and ubiquitination are two related processes. We predict ubiquitination sites; these attributes are lastly fed into the extreme gradient boosting (XGBoost) classifier. We develop reliable predictors computational tool using experimental identification of protein ubiquitination sites is typically labor- and time-intensive. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. We also proposed 2nd features extraction model named dipeptide composition (DPC) model. It is vital to develop reliable predictors since experimental identification of protein ubiquitination sites is typically labor- and time-intensive. In this paper, we proposed computational method as named Ubipro-XGBoost, a multi-view feature-based technique for predicting ubiquitination sites. Recent developments in proteomic technology have sparked renewed interest in the identification of ubiquitination sites in a number of human disorders, which have been studied experimentally and clinically. When more experimentally verified ubiquitination sites appear, we developed a predictive algorithm that can locate lysine ubiquitination sites in large-scale proteome data. This paper introduces Ubipro-XGBoost, a machine learning method. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5-fold cross validation based on DDE model. The findings demonstrate that the suggested technique, Ubipro-XGBoost, outperforms conventional ubiquitination prediction methods and offers fresh advice for ubiquitination site identification.","PeriodicalId":31687,"journal":{"name":"Journal of Mountain Area Research","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"UBI-XGB: IDENTIFICATION OF UBIQUITIN PROTEINS USING MACHINE LEARNING MODEL\",\"authors\":\"Sikandar Rahu, A. Ghulam, A. Farman, Dhani Bux Talpur, Mir Sajjad Hussain Talpur, Erum Saba, Z. A. Maher, Saima Tunio\",\"doi\":\"10.53874/jmar.v8i0.167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A recent line of research has focused on Ubiquitination, a pervasive and proteasome-mediated protein degradation that controls apoptosis and is crucial in the breakdown of proteins and the development of cell disorders, is a major factor. The turnover of proteins and ubiquitination are two related processes. We predict ubiquitination sites; these attributes are lastly fed into the extreme gradient boosting (XGBoost) classifier. We develop reliable predictors computational tool using experimental identification of protein ubiquitination sites is typically labor- and time-intensive. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. We also proposed 2nd features extraction model named dipeptide composition (DPC) model. It is vital to develop reliable predictors since experimental identification of protein ubiquitination sites is typically labor- and time-intensive. In this paper, we proposed computational method as named Ubipro-XGBoost, a multi-view feature-based technique for predicting ubiquitination sites. Recent developments in proteomic technology have sparked renewed interest in the identification of ubiquitination sites in a number of human disorders, which have been studied experimentally and clinically. When more experimentally verified ubiquitination sites appear, we developed a predictive algorithm that can locate lysine ubiquitination sites in large-scale proteome data. This paper introduces Ubipro-XGBoost, a machine learning method. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5-fold cross validation based on DDE model. The findings demonstrate that the suggested technique, Ubipro-XGBoost, outperforms conventional ubiquitination prediction methods and offers fresh advice for ubiquitination site identification.\",\"PeriodicalId\":31687,\"journal\":{\"name\":\"Journal of Mountain Area Research\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Mountain Area Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.53874/jmar.v8i0.167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Mountain Area Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53874/jmar.v8i0.167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

最近的研究集中在泛素化上，泛素化是一种普遍的蛋白酶体介导的蛋白质降解，控制细胞凋亡，在蛋白质分解和细胞紊乱的发展中至关重要，是一个主要因素。蛋白质的周转和泛素化是两个相关的过程。我们预测泛素化位点;这些属性最后被输入到极端梯度提升(XGBoost)分类器中。我们开发可靠的预测计算工具，利用实验鉴定蛋白质泛素化位点是典型的劳动和时间密集。首先，我们使用二肽偏离预期均值(DDE)特征编码技术将蛋白质序列特征编码到矩阵数据中。我们还提出了第二种特征提取模型——二肽组成(DPC)模型。开发可靠的预测因子是至关重要的，因为蛋白质泛素化位点的实验鉴定通常是劳动和时间密集的。在本文中，我们提出了一种名为Ubipro-XGBoost的计算方法，这是一种基于多视图特征的预测泛素化位点的技术。蛋白质组学技术的最新发展激发了人们对许多人类疾病中泛素化位点鉴定的新兴趣，这些研究已经在实验和临床中进行了研究。当更多实验验证的泛素化位点出现时，我们开发了一种预测算法，可以在大规模蛋白质组数据中定位赖氨酸泛素化位点。本文介绍了一种机器学习方法Ubipro-XGBoost。在基于DPC模型的5次交叉验证中，Ubipro-XGBoost的AUC (Receiver Operating Characteristic curve下面积)准确度为0.914%，灵敏度为0.836%，特异性为0.992%，MCC为0.839%;基于DDE模型的5重交叉验证，MCC为0.829%。研究结果表明，该技术优于传统的泛素化预测方法，为泛素化位点的鉴定提供了新的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

UBI-XGB: IDENTIFICATION OF UBIQUITIN PROTEINS USING MACHINE LEARNING MODEL

A recent line of research has focused on Ubiquitination, a pervasive and proteasome-mediated protein degradation that controls apoptosis and is crucial in the breakdown of proteins and the development of cell disorders, is a major factor. The turnover of proteins and ubiquitination are two related processes. We predict ubiquitination sites; these attributes are lastly fed into the extreme gradient boosting (XGBoost) classifier. We develop reliable predictors computational tool using experimental identification of protein ubiquitination sites is typically labor- and time-intensive. First, we encoded protein sequence features into matrix data using Dipeptide Deviation from Expected Mean (DDE) features encoding techniques. We also proposed 2nd features extraction model named dipeptide composition (DPC) model. It is vital to develop reliable predictors since experimental identification of protein ubiquitination sites is typically labor- and time-intensive. In this paper, we proposed computational method as named Ubipro-XGBoost, a multi-view feature-based technique for predicting ubiquitination sites. Recent developments in proteomic technology have sparked renewed interest in the identification of ubiquitination sites in a number of human disorders, which have been studied experimentally and clinically. When more experimentally verified ubiquitination sites appear, we developed a predictive algorithm that can locate lysine ubiquitination sites in large-scale proteome data. This paper introduces Ubipro-XGBoost, a machine learning method. Ubipro-XGBoost had an AUC (area under the Receiver Operating Characteristic curve) of 0.914% accuracy, 0.836% Sensitivity, 0.992% Specificity, and 0.839% MCC on a 5-fold cross validation based on DPC model, and 2nd 0.909% accuracy, 0.839% Sensitivity, 0.979% Specificity, and 0. 0.829% MCC on a 5-fold cross validation based on DDE model. The findings demonstrate that the suggested technique, Ubipro-XGBoost, outperforms conventional ubiquitination prediction methods and offers fresh advice for ubiquitination site identification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Mountain Area Research

自引率

0.00%

发文量

审稿时长

12 weeks