Damayanti, Sutyarso, Akmal Junaidi, F. R. Lumbanraja
{"title":"Model Classification for Predicting the Post-Translational Modification (PTM) Glycosylation in Sequence O Using an Extreme Gradient Boosting Algorithm","authors":"Damayanti, Sutyarso, Akmal Junaidi, F. R. Lumbanraja","doi":"10.3844/jcssp.2024.758.767","DOIUrl":null,"url":null,"abstract":": Post Translational Modification (PTM) is an important mechanism involved in regulating protein function. Post-translational modification refers to the addition of covalent and enzymatic modifications of proteins in protein biosynthesis, which has an important role in modifying protein function and regulating gene expression. One of the post-translational modifications is glycosylation. Glycosylation is the addition of a sugar group to a protein structure. One type of glycosylation is glycosylation, which occurs in sequence O. Glycosylation has been linked to several illnesses, including diabetes, cancer, and the flu. Therefore, it is important to anticipate the occurrence of glycosylation by carrying out predicted glycosylated or non-glycosylated data. Glycosylation prediction has been widely done using manual laboratory techniques, which results in the prediction process being long and expensive for lab equipment. To overcome this, computerized data is needed that can predict glycosylation more quickly. The data used is glycosylation data on sequence O obtained from the UniProt website, which can be openly accessed. This study aimed to improve the accuracy of post-translational modification glycosylation in sequence O prediction using the method of extreme gradient boosting as a framework for gradient enhancement that tends to be faster. This accuracy is increased by conducting feature extraction experiments with the following types: AAIndex, hydrophobicity, sable, composition, CTD, and PseAAC. Feature selection uses the MRMR approach. Evaluation using k-fold cross-validation. The results of this study indicate the prediction performance of post-translational modification glycosylation in sequence O with an accuracy value of 100%. The study's findings indicate that the XGBoost algorithm performs better than other research that has been conducted.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3844/jcssp.2024.758.767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
: Post Translational Modification (PTM) is an important mechanism involved in regulating protein function. Post-translational modification refers to the addition of covalent and enzymatic modifications of proteins in protein biosynthesis, which has an important role in modifying protein function and regulating gene expression. One of the post-translational modifications is glycosylation. Glycosylation is the addition of a sugar group to a protein structure. One type of glycosylation is glycosylation, which occurs in sequence O. Glycosylation has been linked to several illnesses, including diabetes, cancer, and the flu. Therefore, it is important to anticipate the occurrence of glycosylation by carrying out predicted glycosylated or non-glycosylated data. Glycosylation prediction has been widely done using manual laboratory techniques, which results in the prediction process being long and expensive for lab equipment. To overcome this, computerized data is needed that can predict glycosylation more quickly. The data used is glycosylation data on sequence O obtained from the UniProt website, which can be openly accessed. This study aimed to improve the accuracy of post-translational modification glycosylation in sequence O prediction using the method of extreme gradient boosting as a framework for gradient enhancement that tends to be faster. This accuracy is increased by conducting feature extraction experiments with the following types: AAIndex, hydrophobicity, sable, composition, CTD, and PseAAC. Feature selection uses the MRMR approach. Evaluation using k-fold cross-validation. The results of this study indicate the prediction performance of post-translational modification glycosylation in sequence O with an accuracy value of 100%. The study's findings indicate that the XGBoost algorithm performs better than other research that has been conducted.
:翻译后修饰(PTM)是调节蛋白质功能的重要机制。翻译后修饰是指在蛋白质生物合成过程中对蛋白质添加共价修饰和酶修饰,在改变蛋白质功能和调控基因表达方面具有重要作用。糖基化是翻译后修饰之一。糖基化是在蛋白质结构上添加糖基。糖基化与多种疾病有关,包括糖尿病、癌症和流感。因此,通过预测糖基化或非糖基化数据来预测糖基化的发生非常重要。糖基化预测已广泛使用人工实验室技术,这导致预测过程漫长且实验室设备昂贵。为了克服这一问题,需要能更快预测糖基化的计算机化数据。所使用的数据是从 UniProt 网站获取的序列 O 的糖基化数据,该网站可以公开访问。本研究旨在提高序列 O 预测翻译后修饰糖基化的准确性,使用的方法是极端梯度提升法,作为梯度增强的框架,这种方法往往更快。通过对以下类型进行特征提取实验,提高了准确性:AAIndex、疏水性、sable、成分、CTD 和 PseAAC。特征选择采用 MRMR 方法。使用 k 倍交叉验证进行评估。研究结果表明,序列 O 中翻译后修饰糖基化的预测准确率为 100%。研究结果表明,XGBoost 算法的性能优于其他已开展的研究。
期刊介绍:
Journal of Computer Science is aimed to publish research articles on theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. JCS updated twelve times a year and is a peer reviewed journal covers the latest and most compelling research of the time.