利用进化特征与LightGBM解决数据不平衡问题的改进戊二酰化PTM位点预测

S. M. Shovan, Md. Al Mehedi Hasan, M. Islam
{"title":"利用进化特征与LightGBM解决数据不平衡问题的改进戊二酰化PTM位点预测","authors":"S. M. Shovan, Md. Al Mehedi Hasan, M. Islam","doi":"10.1109/ICICT4SD50815.2021.9396995","DOIUrl":null,"url":null,"abstract":"Glutarylation is relatively new lysine specific post translational modification which regulates different biochemical processes and biological activities of living cell. The identification process of glutarylation is still primitive. Mass spectrometry is a laboratory method which is time demanding, cost ineffective and requires a lot of human labour. So development of accurate computational tool can effectively reduce both time and expense. Instead of using amino acids frequency based features, mutation is the evolutionary information which has been considered for extracting the features from peptides. Cluster centroid based undersampling is used for preserving the most useful information from the majority class for handling the imbalance issue. Decision tree based boosting classifier, LightGBM is chosen for having the best performance among other classifiers. As a result, we achieved accuracy, sensitivity and specificity of 76.65%, 72.35% & 80.96% for the 10-fold cross validation and 79.5%, 75% & 84% for independent test set respectively. Thus, our model surpasses the performance of recently developed tools RF -GlutarySite and MDDGlutar.","PeriodicalId":239251,"journal":{"name":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved Prediction of Glutarylation PTM Site using Evolutionary Features with LightGBM Resolving Data Imbalance Issue\",\"authors\":\"S. M. Shovan, Md. Al Mehedi Hasan, M. Islam\",\"doi\":\"10.1109/ICICT4SD50815.2021.9396995\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Glutarylation is relatively new lysine specific post translational modification which regulates different biochemical processes and biological activities of living cell. The identification process of glutarylation is still primitive. Mass spectrometry is a laboratory method which is time demanding, cost ineffective and requires a lot of human labour. So development of accurate computational tool can effectively reduce both time and expense. Instead of using amino acids frequency based features, mutation is the evolutionary information which has been considered for extracting the features from peptides. Cluster centroid based undersampling is used for preserving the most useful information from the majority class for handling the imbalance issue. Decision tree based boosting classifier, LightGBM is chosen for having the best performance among other classifiers. As a result, we achieved accuracy, sensitivity and specificity of 76.65%, 72.35% & 80.96% for the 10-fold cross validation and 79.5%, 75% & 84% for independent test set respectively. Thus, our model surpasses the performance of recently developed tools RF -GlutarySite and MDDGlutar.\",\"PeriodicalId\":239251,\"journal\":{\"name\":\"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT4SD50815.2021.9396995\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT4SD50815.2021.9396995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

戊二酰化是一种相对较新的赖氨酸特异性翻译后修饰,它调节着活细胞的不同生化过程和生物活性。戊二酰化的鉴定过程仍然是原始的。质谱法是一种耗时、成本低、需要大量人力的实验室方法。因此,开发精确的计算工具可以有效地减少时间和费用。而不是使用基于氨基酸频率的特征,突变是进化信息,已被考虑从肽提取特征。基于聚类质心的欠采样用于保留多数类中最有用的信息,以处理不平衡问题。基于决策树的增强分类器LightGBM在众多分类器中表现最好。结果表明,10倍交叉验证的准确度、灵敏度和特异性分别为76.65%、72.35%和80.96%,独立检验集的准确度、灵敏度和特异性分别为79.5%、75%和84%。因此,我们的模型超越了最近开发的工具RF - glutarsite和MDDGlutar的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improved Prediction of Glutarylation PTM Site using Evolutionary Features with LightGBM Resolving Data Imbalance Issue
Glutarylation is relatively new lysine specific post translational modification which regulates different biochemical processes and biological activities of living cell. The identification process of glutarylation is still primitive. Mass spectrometry is a laboratory method which is time demanding, cost ineffective and requires a lot of human labour. So development of accurate computational tool can effectively reduce both time and expense. Instead of using amino acids frequency based features, mutation is the evolutionary information which has been considered for extracting the features from peptides. Cluster centroid based undersampling is used for preserving the most useful information from the majority class for handling the imbalance issue. Decision tree based boosting classifier, LightGBM is chosen for having the best performance among other classifiers. As a result, we achieved accuracy, sensitivity and specificity of 76.65%, 72.35% & 80.96% for the 10-fold cross validation and 79.5%, 75% & 84% for independent test set respectively. Thus, our model surpasses the performance of recently developed tools RF -GlutarySite and MDDGlutar.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信