{"title":"利用进化特征与LightGBM解决数据不平衡问题的改进戊二酰化PTM位点预测","authors":"S. M. Shovan, Md. Al Mehedi Hasan, M. Islam","doi":"10.1109/ICICT4SD50815.2021.9396995","DOIUrl":null,"url":null,"abstract":"Glutarylation is relatively new lysine specific post translational modification which regulates different biochemical processes and biological activities of living cell. The identification process of glutarylation is still primitive. Mass spectrometry is a laboratory method which is time demanding, cost ineffective and requires a lot of human labour. So development of accurate computational tool can effectively reduce both time and expense. Instead of using amino acids frequency based features, mutation is the evolutionary information which has been considered for extracting the features from peptides. Cluster centroid based undersampling is used for preserving the most useful information from the majority class for handling the imbalance issue. Decision tree based boosting classifier, LightGBM is chosen for having the best performance among other classifiers. As a result, we achieved accuracy, sensitivity and specificity of 76.65%, 72.35% & 80.96% for the 10-fold cross validation and 79.5%, 75% & 84% for independent test set respectively. Thus, our model surpasses the performance of recently developed tools RF -GlutarySite and MDDGlutar.","PeriodicalId":239251,"journal":{"name":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved Prediction of Glutarylation PTM Site using Evolutionary Features with LightGBM Resolving Data Imbalance Issue\",\"authors\":\"S. M. Shovan, Md. Al Mehedi Hasan, M. Islam\",\"doi\":\"10.1109/ICICT4SD50815.2021.9396995\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Glutarylation is relatively new lysine specific post translational modification which regulates different biochemical processes and biological activities of living cell. The identification process of glutarylation is still primitive. Mass spectrometry is a laboratory method which is time demanding, cost ineffective and requires a lot of human labour. So development of accurate computational tool can effectively reduce both time and expense. Instead of using amino acids frequency based features, mutation is the evolutionary information which has been considered for extracting the features from peptides. Cluster centroid based undersampling is used for preserving the most useful information from the majority class for handling the imbalance issue. Decision tree based boosting classifier, LightGBM is chosen for having the best performance among other classifiers. As a result, we achieved accuracy, sensitivity and specificity of 76.65%, 72.35% & 80.96% for the 10-fold cross validation and 79.5%, 75% & 84% for independent test set respectively. Thus, our model surpasses the performance of recently developed tools RF -GlutarySite and MDDGlutar.\",\"PeriodicalId\":239251,\"journal\":{\"name\":\"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT4SD50815.2021.9396995\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT4SD50815.2021.9396995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improved Prediction of Glutarylation PTM Site using Evolutionary Features with LightGBM Resolving Data Imbalance Issue
Glutarylation is relatively new lysine specific post translational modification which regulates different biochemical processes and biological activities of living cell. The identification process of glutarylation is still primitive. Mass spectrometry is a laboratory method which is time demanding, cost ineffective and requires a lot of human labour. So development of accurate computational tool can effectively reduce both time and expense. Instead of using amino acids frequency based features, mutation is the evolutionary information which has been considered for extracting the features from peptides. Cluster centroid based undersampling is used for preserving the most useful information from the majority class for handling the imbalance issue. Decision tree based boosting classifier, LightGBM is chosen for having the best performance among other classifiers. As a result, we achieved accuracy, sensitivity and specificity of 76.65%, 72.35% & 80.96% for the 10-fold cross validation and 79.5%, 75% & 84% for independent test set respectively. Thus, our model surpasses the performance of recently developed tools RF -GlutarySite and MDDGlutar.