Hany Al Ashwal, Areeg S. Abdalla, M. E. Halaby, A. Moustafa
{"title":"Feature Selection for the Classification of Alzheimer's Disease Data","authors":"Hany Al Ashwal, Areeg S. Abdalla, M. E. Halaby, A. Moustafa","doi":"10.1145/3378936.3378982","DOIUrl":null,"url":null,"abstract":"In this paper, we describe the features of our large dataset (6400+ rows and 400+ features) that includes Alzheimer's disease (AD) patients, individuals with mild cognitive impairment (MCI, prodromal stage of Alzheimer's disease), and healthy individuals (without AD or MCI). We also, present a feature selection method applied on the dataset. Unlike prior data mining models that were applied to AD, our dataset is big in nature and includes genetic, neural, nutritional, and cognitive measures of all the individuals. All of these measures in the data have been shown by empirical studies to be related to the development of AD. We used a random forest classifier to discover which features best classify and differentiate between AD patients and healthy individuals. Identifying these features will likely provide evidence for protective factors against the development of AD.","PeriodicalId":304149,"journal":{"name":"Proceedings of the 3rd International Conference on Software Engineering and Information Management","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Software Engineering and Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378936.3378982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we describe the features of our large dataset (6400+ rows and 400+ features) that includes Alzheimer's disease (AD) patients, individuals with mild cognitive impairment (MCI, prodromal stage of Alzheimer's disease), and healthy individuals (without AD or MCI). We also, present a feature selection method applied on the dataset. Unlike prior data mining models that were applied to AD, our dataset is big in nature and includes genetic, neural, nutritional, and cognitive measures of all the individuals. All of these measures in the data have been shown by empirical studies to be related to the development of AD. We used a random forest classifier to discover which features best classify and differentiate between AD patients and healthy individuals. Identifying these features will likely provide evidence for protective factors against the development of AD.