AWS ECS聚类胸片正常与异常分类特征选择方法的比较研究

Vaibhav Sanjay Lalka, Srinivasa Rao Kundeti, Vinod Kumar, V. J.
{"title":"AWS ECS聚类胸片正常与异常分类特征选择方法的比较研究","authors":"Vaibhav Sanjay Lalka, Srinivasa Rao Kundeti, Vinod Kumar, V. J.","doi":"10.1109/CCEM.2018.00011","DOIUrl":null,"url":null,"abstract":"Machine learning algorithms are used to discover complex nonlinear relationships in biomedical data. However, sophisticated learning models becomes computationally unfeasible when dimension of the data increases. One of the solution to overcome this problem is to use feature selection methods. Feature selection methods finds the optimal feature subset and the subset performance is evaluated using some evaluation criteria, these methods are categorized as Filter, Wrapper, Embedded and Hybrid approaches. Even though these methods reduces the dimension of the data, the execution time of training increases as the dataset size increases. And also nowadays the preferred place for storage of data is cloud. Thus, the first step before applying machine learning algorithms is to copy the data to our local machine. This might take lot of time, if the size of data is huge. So to overcome such problems, here we propose a pipeline that runs on the AWS cloud based distributed architecture capable of doing feature selection, training and classifying. Here, we define an evaluation criteria that measures the performance of feature subsets based on the classification accuracy and size of the feature subset. The experiments were carried out on two chest X-ray datasets (Shenzhen and NIH) clinically tested as normal or abnormal. We achieved the classification accuracy of 84.24% for Shenzhen dataset and 79.55% for NIH dataset for classifying the chest X-ray image as normal or abnormal reducing the feature subset size to more than 50% with hybrid approach of feature selection and using defined evaluation criteria.","PeriodicalId":156315,"journal":{"name":"2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Comparative Study of Feature Selection Methods for Classification of Chest X-Ray Image as Normal or Abnormal Inside AWS ECS Cluster\",\"authors\":\"Vaibhav Sanjay Lalka, Srinivasa Rao Kundeti, Vinod Kumar, V. J.\",\"doi\":\"10.1109/CCEM.2018.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning algorithms are used to discover complex nonlinear relationships in biomedical data. However, sophisticated learning models becomes computationally unfeasible when dimension of the data increases. One of the solution to overcome this problem is to use feature selection methods. Feature selection methods finds the optimal feature subset and the subset performance is evaluated using some evaluation criteria, these methods are categorized as Filter, Wrapper, Embedded and Hybrid approaches. Even though these methods reduces the dimension of the data, the execution time of training increases as the dataset size increases. And also nowadays the preferred place for storage of data is cloud. Thus, the first step before applying machine learning algorithms is to copy the data to our local machine. This might take lot of time, if the size of data is huge. So to overcome such problems, here we propose a pipeline that runs on the AWS cloud based distributed architecture capable of doing feature selection, training and classifying. Here, we define an evaluation criteria that measures the performance of feature subsets based on the classification accuracy and size of the feature subset. The experiments were carried out on two chest X-ray datasets (Shenzhen and NIH) clinically tested as normal or abnormal. We achieved the classification accuracy of 84.24% for Shenzhen dataset and 79.55% for NIH dataset for classifying the chest X-ray image as normal or abnormal reducing the feature subset size to more than 50% with hybrid approach of feature selection and using defined evaluation criteria.\",\"PeriodicalId\":156315,\"journal\":{\"name\":\"2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCEM.2018.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCEM.2018.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

机器学习算法用于发现生物医学数据中复杂的非线性关系。然而,当数据维数增加时,复杂的学习模型在计算上变得不可行。解决这一问题的方法之一是使用特征选择方法。特征选择方法找到最优的特征子集,并使用一定的评价标准对子集的性能进行评价,这些方法可分为过滤器、包装器、嵌入和混合方法。尽管这些方法降低了数据的维数,但训练的执行时间随着数据集大小的增加而增加。而且现在存储数据的首选地点是云。因此,在应用机器学习算法之前的第一步是将数据复制到本地机器上。如果数据量很大,这可能会花费很多时间。因此,为了克服这些问题,我们在这里提出了一个运行在基于AWS云的分布式架构上的管道,能够进行特征选择、训练和分类。在这里,我们定义了一个基于分类精度和特征子集的大小来衡量特征子集性能的评估标准。实验在深圳市和美国国立卫生研究院两个经临床检测为正常或异常的胸部x线数据集上进行。采用特征选择和自定义评价标准的混合方法,对胸部x线图像进行正常或异常分类,深圳数据集的分类准确率为84.24%,NIH数据集的分类准确率为79.55%,特征子集大小减少到50%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Comparative Study of Feature Selection Methods for Classification of Chest X-Ray Image as Normal or Abnormal Inside AWS ECS Cluster
Machine learning algorithms are used to discover complex nonlinear relationships in biomedical data. However, sophisticated learning models becomes computationally unfeasible when dimension of the data increases. One of the solution to overcome this problem is to use feature selection methods. Feature selection methods finds the optimal feature subset and the subset performance is evaluated using some evaluation criteria, these methods are categorized as Filter, Wrapper, Embedded and Hybrid approaches. Even though these methods reduces the dimension of the data, the execution time of training increases as the dataset size increases. And also nowadays the preferred place for storage of data is cloud. Thus, the first step before applying machine learning algorithms is to copy the data to our local machine. This might take lot of time, if the size of data is huge. So to overcome such problems, here we propose a pipeline that runs on the AWS cloud based distributed architecture capable of doing feature selection, training and classifying. Here, we define an evaluation criteria that measures the performance of feature subsets based on the classification accuracy and size of the feature subset. The experiments were carried out on two chest X-ray datasets (Shenzhen and NIH) clinically tested as normal or abnormal. We achieved the classification accuracy of 84.24% for Shenzhen dataset and 79.55% for NIH dataset for classifying the chest X-ray image as normal or abnormal reducing the feature subset size to more than 50% with hybrid approach of feature selection and using defined evaluation criteria.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信