{"title":"用包装型特征选择技术表征精神健康障碍的表型::RF-RFE和模糊森林的比较","authors":"Jiemiao Chen","doi":"10.1145/3529836.3529837","DOIUrl":null,"url":null,"abstract":"Random Forest is a popular feature selection method suitable for handling “small n, large p” problem but lacking capability of dealing collinearity. To compensate the gap of removing highly correlated features, wrapper-typed feature selection methods: Fuzzy Forest (FF) and Random Forest- Recursive Feature Elimination (RF-RFE) have been developed. These two methods are similar in multiple ways but implement different strategies to deal with features. Meanwhile, the field of clinical psychiatry is changing in the way it characterizes mental health disorders. Thus, the aims of our paper are to compare the impact of FF and RF-RFE and to study phenotypic features relevant to three mental health disorders: schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder (ADHD). We specify the classification problem as “one versus rest” (OVR) and implement phenotype data from Consortium for Neuropsychiatric Phenomics. FF and RF-RFE are applied to select the optimal feature subsets separately, which are then evaluated by Support Vector Machines (SVM) and Extreme Learning Machines (ELM) classifiers respectively. The evaluation criteria include precision, recall, accuracy, F-measure and Area under the curve. As a result, RF-RFE showed superior feature selection performance over FF. Also, we found that the features of the original data are informative in diseases from most to least: schizophrenia, ADHD and bipolar disorder.","PeriodicalId":285191,"journal":{"name":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Characterizing phenotypes for Mental Health Disorders with Wrapper-typed Feature selection techniques:: Comparison of RF-RFE and Fuzzy Forest\",\"authors\":\"Jiemiao Chen\",\"doi\":\"10.1145/3529836.3529837\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Random Forest is a popular feature selection method suitable for handling “small n, large p” problem but lacking capability of dealing collinearity. To compensate the gap of removing highly correlated features, wrapper-typed feature selection methods: Fuzzy Forest (FF) and Random Forest- Recursive Feature Elimination (RF-RFE) have been developed. These two methods are similar in multiple ways but implement different strategies to deal with features. Meanwhile, the field of clinical psychiatry is changing in the way it characterizes mental health disorders. Thus, the aims of our paper are to compare the impact of FF and RF-RFE and to study phenotypic features relevant to three mental health disorders: schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder (ADHD). We specify the classification problem as “one versus rest” (OVR) and implement phenotype data from Consortium for Neuropsychiatric Phenomics. FF and RF-RFE are applied to select the optimal feature subsets separately, which are then evaluated by Support Vector Machines (SVM) and Extreme Learning Machines (ELM) classifiers respectively. The evaluation criteria include precision, recall, accuracy, F-measure and Area under the curve. As a result, RF-RFE showed superior feature selection performance over FF. Also, we found that the features of the original data are informative in diseases from most to least: schizophrenia, ADHD and bipolar disorder.\",\"PeriodicalId\":285191,\"journal\":{\"name\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"volume\":\"155 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529836.3529837\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529836.3529837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Characterizing phenotypes for Mental Health Disorders with Wrapper-typed Feature selection techniques:: Comparison of RF-RFE and Fuzzy Forest
Random Forest is a popular feature selection method suitable for handling “small n, large p” problem but lacking capability of dealing collinearity. To compensate the gap of removing highly correlated features, wrapper-typed feature selection methods: Fuzzy Forest (FF) and Random Forest- Recursive Feature Elimination (RF-RFE) have been developed. These two methods are similar in multiple ways but implement different strategies to deal with features. Meanwhile, the field of clinical psychiatry is changing in the way it characterizes mental health disorders. Thus, the aims of our paper are to compare the impact of FF and RF-RFE and to study phenotypic features relevant to three mental health disorders: schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder (ADHD). We specify the classification problem as “one versus rest” (OVR) and implement phenotype data from Consortium for Neuropsychiatric Phenomics. FF and RF-RFE are applied to select the optimal feature subsets separately, which are then evaluated by Support Vector Machines (SVM) and Extreme Learning Machines (ELM) classifiers respectively. The evaluation criteria include precision, recall, accuracy, F-measure and Area under the curve. As a result, RF-RFE showed superior feature selection performance over FF. Also, we found that the features of the original data are informative in diseases from most to least: schizophrenia, ADHD and bipolar disorder.