{"title":"Gradient boosting model for unbalanced quantitative mass spectra quality assessment","authors":"Long Chen, T. Zhang, Tianjun Li","doi":"10.1109/SPAC.2017.8304311","DOIUrl":null,"url":null,"abstract":"A method for controlling the quality of isotope labeled mass spectra is described here. In such mass spectra, the profiles of labeled (heavy) and unlabeled (light) peptide pairs provide us valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low quality spectra or the peptides with error profiles. The most common used method for this problem is training a classifier for the spectra data to separate it into positive (high quality) and negative (low quality) ones. However, the small number of error profiles always makes the training data dominated by the positive samples, i.e., class imbalance problem. So the Syntheic minority over-sampling technique (SMOTE) is employed to handle the unbalanced data and then applied extreme gradient boosting (Xgboost) model as the classifier. We assessed the different heavy-light peptide ratio samples by the trained Xgboost classifier, and found that the SMOTE Xgboost classifier increases the reliability of peptide ratio estimations significantly.","PeriodicalId":161647,"journal":{"name":"2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPAC.2017.8304311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
A method for controlling the quality of isotope labeled mass spectra is described here. In such mass spectra, the profiles of labeled (heavy) and unlabeled (light) peptide pairs provide us valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low quality spectra or the peptides with error profiles. The most common used method for this problem is training a classifier for the spectra data to separate it into positive (high quality) and negative (low quality) ones. However, the small number of error profiles always makes the training data dominated by the positive samples, i.e., class imbalance problem. So the Syntheic minority over-sampling technique (SMOTE) is employed to handle the unbalanced data and then applied extreme gradient boosting (Xgboost) model as the classifier. We assessed the different heavy-light peptide ratio samples by the trained Xgboost classifier, and found that the SMOTE Xgboost classifier increases the reliability of peptide ratio estimations significantly.