{"title":"An improved ensemble learning method with SMOTE for protein interaction hot spots prediction","authors":"Qianqian Huang, Xiaolong Zhang","doi":"10.1109/BIBM.2016.7822756","DOIUrl":null,"url":null,"abstract":"In the protein-protein interactions, only a small subset of hot spot residues contributes significantly to the binding free energy. Therefore, there is an imbalance between the number of hot spots and non-hot spots. The prediction of hot spot residues is very important in the protein-protein interaction. This paper presents an improved ensemble learning method-Adaboost with SMOTE method to deal with the imbalanced data and predict protein hot spots in the latest database SKEMPI. Firstly, the amino acid information such as hydrophobicity of the amino acid and protein structural features is exacted. Then mRMR algorithm was used to select the features. Finally, the protein database is further handled by SMOTE to deal with the imbalance data, the protein hot spots are predicted by the ensemble learning method-Adaboost. Experimental results show that the proposed method has the ability to improve the predict accuracy.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822756","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
In the protein-protein interactions, only a small subset of hot spot residues contributes significantly to the binding free energy. Therefore, there is an imbalance between the number of hot spots and non-hot spots. The prediction of hot spot residues is very important in the protein-protein interaction. This paper presents an improved ensemble learning method-Adaboost with SMOTE method to deal with the imbalanced data and predict protein hot spots in the latest database SKEMPI. Firstly, the amino acid information such as hydrophobicity of the amino acid and protein structural features is exacted. Then mRMR algorithm was used to select the features. Finally, the protein database is further handled by SMOTE to deal with the imbalance data, the protein hot spots are predicted by the ensemble learning method-Adaboost. Experimental results show that the proposed method has the ability to improve the predict accuracy.