Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of Lysine by Fusing Serial and Automatic Encoder

IF 2.4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics Pub Date : 2023-12-04 DOI:10.2174/0115748936272040231117114252

Ying Liang, Suhui Li, Xiya You, You Guo, Jianjun Tang

{"title":"Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of Lysine by Fusing Serial and Automatic Encoder","authors":"Ying Liang, Suhui Li, Xiya You, You Guo, Jianjun Tang","doi":"10.2174/0115748936272040231117114252","DOIUrl":null,"url":null,"abstract":"Background:: Protein lysine crotonylation (Kcr), a newly discovered important post-translational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmen-tal defects and malignant transformation. Objective:: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computa-tional techniques. Methods:: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical proper-ties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. method: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. Results:: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Addi-tionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the cur-rent model. Conclusion:: These outcomes are additional evidence that Stacking-Kcr has strong application po-tential and generalization performance.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"44 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936272040231117114252","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background:: Protein lysine crotonylation (Kcr), a newly discovered important post-translational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmen-tal defects and malignant transformation. Objective:: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computa-tional techniques. Methods:: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical proper-ties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. method: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. Results:: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Addi-tionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the cur-rent model. Conclusion:: These outcomes are additional evidence that Stacking-Kcr has strong application po-tential and generalization performance.

查看原文本刊更多论文

堆叠- kcr:融合序列和自动编码器预测赖氨酸Crotonylation位点的堆叠模型

背景:蛋白质赖氨酸巴丁酰化(Protein lysine crotonylation, Kcr)是一种新发现的重要的翻译后修饰(PTM)，它通常定位于转录起始位点，调控基因表达，与发育缺陷和恶性转化等多种病理状况有关。目的:确定Kcr位点有利于发现其生物学机制和开发治疗相关疾病的新药。然而，传统的实验方法是昂贵和低效的，需要新的计算技术的发展。在这项工作中，为了准确地识别Kcr位点，我们提出了一个称为堆叠-Kcr的集成学习模型。首先，从序列信息、理化性质和序列片段相似性中提取特征;然后，分别利用自动编码器和串行将序列信息和理化性质两个特征融合。最后，将融合后的两个特征和序列片段相似特征分别输入到四个基分类器中，利用一级预测结果构建元分类器，得到最终的预测结果。方法:在这项工作中，为了准确地识别Kcr位点，我们提出了一个称为堆叠-Kcr的集成学习模型。首先，从序列信息、理化性质、序列片段相似性等方面提取特征;然后，分别利用自动编码器和串行将序列信息和理化性质两个特征融合。最后，将融合后的两个特征和序列片段相似特征分别输入到四个基分类器中，利用一级预测结果构建元分类器，得到最终的预测结果。结果:该模型经五重交叉验证，准确率为0.828,AUC为0.910。这表明stack - kcr方法相对于传统的机器学习方法具有明显的优势。在独立测试集上，stack - kcr的准确率为84.89%，AUC为92.21%，高于其他先进工具的1.7%和0.8%。此外，我们在磷酸化位点上训练stack - kcr，结果优于目前的模型。结论:这些结果进一步证明了堆叠- kcr具有较强的应用潜力和泛化性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.