纵向人类衰老数据分类的特征选择

2017 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2017-11-01 DOI:10.1109/ICDMW.2017.102

Tossapol Pomsuwan, A. Freitas

{"title":"纵向人类衰老数据分类的特征选择","authors":"Tossapol Pomsuwan, A. Freitas","doi":"10.1109/ICDMW.2017.102","DOIUrl":null,"url":null,"abstract":"We propose a new variant of the Correlation-based Feature Selection (CFS) method for coping with longitudinal data – where variables are repeatedly measured across different time points. The proposed CFS variant is evaluated on ten datasets created using data from the English Longitudinal Study of Ageing (ELSA), with different age-related diseases used as the class variables to be predicted. The results show that, overall, the proposed CFS variant leads to better predictive performance than the standard CFS and the baseline approach of no feature selection, when using Naïve Bayes and J48 decision tree induction as classification algorithms (although the difference in performance is very small in the results for J4.8). We also report the most relevant features selected by J48 across the datasets.","PeriodicalId":389183,"journal":{"name":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Feature Selection for the Classification of Longitudinal Human Ageing Data\",\"authors\":\"Tossapol Pomsuwan, A. Freitas\",\"doi\":\"10.1109/ICDMW.2017.102\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new variant of the Correlation-based Feature Selection (CFS) method for coping with longitudinal data – where variables are repeatedly measured across different time points. The proposed CFS variant is evaluated on ten datasets created using data from the English Longitudinal Study of Ageing (ELSA), with different age-related diseases used as the class variables to be predicted. The results show that, overall, the proposed CFS variant leads to better predictive performance than the standard CFS and the baseline approach of no feature selection, when using Naïve Bayes and J48 decision tree induction as classification algorithms (although the difference in performance is very small in the results for J4.8). We also report the most relevant features selected by J48 across the datasets.\",\"PeriodicalId\":389183,\"journal\":{\"name\":\"2017 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2017.102\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2017.102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

我们提出了一种新的基于关联的特征选择(CFS)方法，用于处理纵向数据，其中变量在不同的时间点上重复测量。提出的CFS变异是在使用英国老龄化纵向研究(ELSA)数据创建的10个数据集上进行评估的，不同的年龄相关疾病被用作要预测的类别变量。结果表明，总体而言，当使用Naïve Bayes和J48决策树归纳作为分类算法时，所提出的CFS变体的预测性能优于标准CFS和无特征选择的基线方法(尽管在J4.8的结果中性能差异很小)。我们还报告了J48在数据集中选择的最相关的特性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feature Selection for the Classification of Longitudinal Human Ageing Data

We propose a new variant of the Correlation-based Feature Selection (CFS) method for coping with longitudinal data – where variables are repeatedly measured across different time points. The proposed CFS variant is evaluated on ten datasets created using data from the English Longitudinal Study of Ageing (ELSA), with different age-related diseases used as the class variables to be predicted. The results show that, overall, the proposed CFS variant leads to better predictive performance than the standard CFS and the baseline approach of no feature selection, when using Naïve Bayes and J48 decision tree induction as classification algorithms (although the difference in performance is very small in the results for J4.8). We also report the most relevant features selected by J48 across the datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量