{"title":"基于两阶段机器学习和层次集成的蛋白质模型质量评估新启发式方法","authors":"Junlin Wang, Wenbo Wang, Yingzi Shang, Dong Xu","doi":"10.1109/CogMI56440.2022.00022","DOIUrl":null,"url":null,"abstract":"Computational protein structure prediction is an important problem in bioinformatics and the ability to accurately evaluating the quality of predicted protein models is of significant interest. In this paper, three new single-model quality assessment (QA) methods, MMQA-1 MMQA-2 and MMQA-HE, are proposed based on two-stage machine learning and hierarchical ensemble techniques. MMQA-1 and MMQA-2 train different machine learning models in two separate stages. They divide the entire feature set into two groups and uses completely different feature sets and training data in each stage to train a predictive model. MMQA-HE is an ensemble method that combines individual models not only at the tree level, but also at the forest level. In CASP14, MMQA-1 ranked No. 2 in terms of average GDT-TS difference. MMQA-2 and MMQA-HE improve MMQA-1 and outperform existing state-of-the-art QA methods across multiple QA performance metrics.","PeriodicalId":211430,"journal":{"name":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"New Heuristic Methods for Protein Model Quality Assessment via Two-Stage Machine Learning and Hierarchical Ensemble\",\"authors\":\"Junlin Wang, Wenbo Wang, Yingzi Shang, Dong Xu\",\"doi\":\"10.1109/CogMI56440.2022.00022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational protein structure prediction is an important problem in bioinformatics and the ability to accurately evaluating the quality of predicted protein models is of significant interest. In this paper, three new single-model quality assessment (QA) methods, MMQA-1 MMQA-2 and MMQA-HE, are proposed based on two-stage machine learning and hierarchical ensemble techniques. MMQA-1 and MMQA-2 train different machine learning models in two separate stages. They divide the entire feature set into two groups and uses completely different feature sets and training data in each stage to train a predictive model. MMQA-HE is an ensemble method that combines individual models not only at the tree level, but also at the forest level. In CASP14, MMQA-1 ranked No. 2 in terms of average GDT-TS difference. MMQA-2 and MMQA-HE improve MMQA-1 and outperform existing state-of-the-art QA methods across multiple QA performance metrics.\",\"PeriodicalId\":211430,\"journal\":{\"name\":\"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CogMI56440.2022.00022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CogMI56440.2022.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
New Heuristic Methods for Protein Model Quality Assessment via Two-Stage Machine Learning and Hierarchical Ensemble
Computational protein structure prediction is an important problem in bioinformatics and the ability to accurately evaluating the quality of predicted protein models is of significant interest. In this paper, three new single-model quality assessment (QA) methods, MMQA-1 MMQA-2 and MMQA-HE, are proposed based on two-stage machine learning and hierarchical ensemble techniques. MMQA-1 and MMQA-2 train different machine learning models in two separate stages. They divide the entire feature set into two groups and uses completely different feature sets and training data in each stage to train a predictive model. MMQA-HE is an ensemble method that combines individual models not only at the tree level, but also at the forest level. In CASP14, MMQA-1 ranked No. 2 in terms of average GDT-TS difference. MMQA-2 and MMQA-HE improve MMQA-1 and outperform existing state-of-the-art QA methods across multiple QA performance metrics.