{"title":"使用集成的生物数据准确分类","authors":"M. Bhardwaj, D. Dash, Vasudha Bhatnagar","doi":"10.1109/ICDMW.2015.229","DOIUrl":null,"url":null,"abstract":"Predicting the class to which a given protein sequence belongs is a challenging research area in bioinformatics. Machine learning techniques have been successfully applied to protein prediction problems like allergen prediction, mitochondrial prediction and toxin prediction. Physicochemical properties derived from sequences of amino acids have been commonly used for this purpose. In this paper, we propose an SVM based ensemble method for classification of protein datasets. The constituent classifiers of the ensemble are generated in a sequential manner, each one attempting to rectify mistakes made by previous one. The ensemble is aptly called Self-Chastisting Ensemble (SCE) because of the iterative refinement each classifier carries out over the previous one. We present two versions of the algorithm: SCE-Bal for balanced datasets and SCE-Imbal for imbalanced datasets. Empirical results further demonstrate that the algorithm delivers superior performance using simple and computationally efficient features (amino acid composition and dipeptide composition) compared to other machine learning methods using complex feature sets.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"67 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accurate Classification of Biological Data Using Ensembles\",\"authors\":\"M. Bhardwaj, D. Dash, Vasudha Bhatnagar\",\"doi\":\"10.1109/ICDMW.2015.229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predicting the class to which a given protein sequence belongs is a challenging research area in bioinformatics. Machine learning techniques have been successfully applied to protein prediction problems like allergen prediction, mitochondrial prediction and toxin prediction. Physicochemical properties derived from sequences of amino acids have been commonly used for this purpose. In this paper, we propose an SVM based ensemble method for classification of protein datasets. The constituent classifiers of the ensemble are generated in a sequential manner, each one attempting to rectify mistakes made by previous one. The ensemble is aptly called Self-Chastisting Ensemble (SCE) because of the iterative refinement each classifier carries out over the previous one. We present two versions of the algorithm: SCE-Bal for balanced datasets and SCE-Imbal for imbalanced datasets. Empirical results further demonstrate that the algorithm delivers superior performance using simple and computationally efficient features (amino acid composition and dipeptide composition) compared to other machine learning methods using complex feature sets.\",\"PeriodicalId\":192888,\"journal\":{\"name\":\"2015 IEEE International Conference on Data Mining Workshop (ICDMW)\",\"volume\":\"67 7\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Data Mining Workshop (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2015.229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2015.229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accurate Classification of Biological Data Using Ensembles
Predicting the class to which a given protein sequence belongs is a challenging research area in bioinformatics. Machine learning techniques have been successfully applied to protein prediction problems like allergen prediction, mitochondrial prediction and toxin prediction. Physicochemical properties derived from sequences of amino acids have been commonly used for this purpose. In this paper, we propose an SVM based ensemble method for classification of protein datasets. The constituent classifiers of the ensemble are generated in a sequential manner, each one attempting to rectify mistakes made by previous one. The ensemble is aptly called Self-Chastisting Ensemble (SCE) because of the iterative refinement each classifier carries out over the previous one. We present two versions of the algorithm: SCE-Bal for balanced datasets and SCE-Imbal for imbalanced datasets. Empirical results further demonstrate that the algorithm delivers superior performance using simple and computationally efficient features (amino acid composition and dipeptide composition) compared to other machine learning methods using complex feature sets.