{"title":"人类微生物组数据分类器的比较研究。","authors":"Xu-Wen Wang , Yang-Yu Liu","doi":"10.1016/j.medmic.2020.100013","DOIUrl":null,"url":null,"abstract":"<div><p>Accumulated evidence has shown that commensal microorganisms play key roles in human physiology and diseases. Dysbiosis of the human-associated microbial communities, often referred to as the human microbiome, has been associated with many diseases. Applying supervised classification analysis to the human microbiome data can help us identify subsets of microorganisms that are highly discriminative and hence build prediction models that can accurately classify unlabeled samples. Here, we systematically compare two state-of-the-art ensemble classifiers: <u>R</u>andom <u>F</u>orests (RF), e<u>X</u>treme <u>G</u>radient <u>Boost</u>ing decision trees (XGBoost) and two traditional methods: The <u>e</u>lastic <u>net</u> (ENET) and <u>S</u>upport <u>V</u>ector <u>M</u>achine (SVM) in the classification analysis of 29 benchmark human microbiome datasets. We find that XGBoost outperforms all other methods only in a few benchmark datasets. Overall, the XGBoost, RF and ENET display comparable performance in the remaining benchmark datasets. The training time of XGBoost is much longer than others, partially due to the much larger number of hyperparameters in XGBoost. We also find that the most important features selected by the four classifiers partially overlap. Yet, the difference between their classification performance is almost independent of this overlap.</p></div>","PeriodicalId":36019,"journal":{"name":"Medicine in Microecology","volume":"4 ","pages":"Article 100013"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.medmic.2020.100013","citationCount":"31","resultStr":"{\"title\":\"Comparative study of classifiers for human microbiome data\",\"authors\":\"Xu-Wen Wang , Yang-Yu Liu\",\"doi\":\"10.1016/j.medmic.2020.100013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Accumulated evidence has shown that commensal microorganisms play key roles in human physiology and diseases. Dysbiosis of the human-associated microbial communities, often referred to as the human microbiome, has been associated with many diseases. Applying supervised classification analysis to the human microbiome data can help us identify subsets of microorganisms that are highly discriminative and hence build prediction models that can accurately classify unlabeled samples. Here, we systematically compare two state-of-the-art ensemble classifiers: <u>R</u>andom <u>F</u>orests (RF), e<u>X</u>treme <u>G</u>radient <u>Boost</u>ing decision trees (XGBoost) and two traditional methods: The <u>e</u>lastic <u>net</u> (ENET) and <u>S</u>upport <u>V</u>ector <u>M</u>achine (SVM) in the classification analysis of 29 benchmark human microbiome datasets. We find that XGBoost outperforms all other methods only in a few benchmark datasets. Overall, the XGBoost, RF and ENET display comparable performance in the remaining benchmark datasets. The training time of XGBoost is much longer than others, partially due to the much larger number of hyperparameters in XGBoost. We also find that the most important features selected by the four classifiers partially overlap. Yet, the difference between their classification performance is almost independent of this overlap.</p></div>\",\"PeriodicalId\":36019,\"journal\":{\"name\":\"Medicine in Microecology\",\"volume\":\"4 \",\"pages\":\"Article 100013\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.medmic.2020.100013\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medicine in Microecology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590097820300100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine in Microecology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590097820300100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
Comparative study of classifiers for human microbiome data
Accumulated evidence has shown that commensal microorganisms play key roles in human physiology and diseases. Dysbiosis of the human-associated microbial communities, often referred to as the human microbiome, has been associated with many diseases. Applying supervised classification analysis to the human microbiome data can help us identify subsets of microorganisms that are highly discriminative and hence build prediction models that can accurately classify unlabeled samples. Here, we systematically compare two state-of-the-art ensemble classifiers: Random Forests (RF), eXtreme Gradient Boosting decision trees (XGBoost) and two traditional methods: The elastic net (ENET) and Support Vector Machine (SVM) in the classification analysis of 29 benchmark human microbiome datasets. We find that XGBoost outperforms all other methods only in a few benchmark datasets. Overall, the XGBoost, RF and ENET display comparable performance in the remaining benchmark datasets. The training time of XGBoost is much longer than others, partially due to the much larger number of hyperparameters in XGBoost. We also find that the most important features selected by the four classifiers partially overlap. Yet, the difference between their classification performance is almost independent of this overlap.