未诊断样本辅助粗糙集特征选择医学数据

2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing Pub Date : 2012-12-01 DOI:10.1109/PDGC.2012.6449895

D. Guan, Weiwei Yuan, Zilong Jin, Sungyoung Lee

{"title":"未诊断样本辅助粗糙集特征选择医学数据","authors":"D. Guan, Weiwei Yuan, Zilong Jin, Sungyoung Lee","doi":"10.1109/PDGC.2012.6449895","DOIUrl":null,"url":null,"abstract":"Medical data often consists of a large number of disease markers. For medical data analysis, some disease markers are not helpful and sometimes even have negative effects. Therefore, applying feature selection is necessary as it can remove those unimportant disease markers. Among many feature selection methods, rough set based feature selection (RSFS) has been widely used. Unlike other methods, RSFS is completely data-driven. It does not require any other information like probability distributions. Traditional RSFS methods extract the information only from the diagnosed samples. Therefore, they usually require a large number of diagnosed samples to achieve the good feature selection performance. However, in many real medical applications, diagnosed samples are limited, yet the number of undiagnosed samples is large. Motivated by semi-supervised learning methodology, in this paper, we propose a novel RSFS method which can learn from both diagnosed and undiagnosed samples. This method is called undiagnosed samples aided rough set feature selection (USA-RSFS). Its main benefit is to reduce the requirement on diagnosed samples by the help of undiagnosed ones. Finally, the promising performance of USA-RSFS is validated through a set of experiments on medical datasets.","PeriodicalId":166718,"journal":{"name":"2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Undiagnosed samples aided rough set feature selection for medical data\",\"authors\":\"D. Guan, Weiwei Yuan, Zilong Jin, Sungyoung Lee\",\"doi\":\"10.1109/PDGC.2012.6449895\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Medical data often consists of a large number of disease markers. For medical data analysis, some disease markers are not helpful and sometimes even have negative effects. Therefore, applying feature selection is necessary as it can remove those unimportant disease markers. Among many feature selection methods, rough set based feature selection (RSFS) has been widely used. Unlike other methods, RSFS is completely data-driven. It does not require any other information like probability distributions. Traditional RSFS methods extract the information only from the diagnosed samples. Therefore, they usually require a large number of diagnosed samples to achieve the good feature selection performance. However, in many real medical applications, diagnosed samples are limited, yet the number of undiagnosed samples is large. Motivated by semi-supervised learning methodology, in this paper, we propose a novel RSFS method which can learn from both diagnosed and undiagnosed samples. This method is called undiagnosed samples aided rough set feature selection (USA-RSFS). Its main benefit is to reduce the requirement on diagnosed samples by the help of undiagnosed ones. Finally, the promising performance of USA-RSFS is validated through a set of experiments on medical datasets.\",\"PeriodicalId\":166718,\"journal\":{\"name\":\"2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC.2012.6449895\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2012.6449895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

医学数据通常由大量的疾病标记物组成。对于医学数据分析，一些疾病标记物没有帮助，有时甚至产生负面影响。因此，应用特征选择是必要的，因为它可以去除那些不重要的疾病标记。在众多特征选择方法中，基于粗糙集的特征选择(RSFS)得到了广泛的应用。与其他方法不同，RSFS完全是数据驱动的。它不需要任何其他信息，比如概率分布。传统的RSFS方法仅从诊断样本中提取信息。因此，它们通常需要大量的诊断样本才能达到良好的特征选择性能。然而，在许多实际的医学应用中，诊断的样本是有限的，而未诊断的样本数量很大。在半监督学习方法的激励下，我们提出了一种新的RSFS方法，可以从诊断和未诊断的样本中学习。这种方法被称为未诊断样本辅助粗糙集特征选择(USA-RSFS)。它的主要好处是在未诊断样本的帮助下减少了对诊断样本的需求。最后，通过一组医学数据集实验验证了USA-RSFS的良好性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Undiagnosed samples aided rough set feature selection for medical data

Medical data often consists of a large number of disease markers. For medical data analysis, some disease markers are not helpful and sometimes even have negative effects. Therefore, applying feature selection is necessary as it can remove those unimportant disease markers. Among many feature selection methods, rough set based feature selection (RSFS) has been widely used. Unlike other methods, RSFS is completely data-driven. It does not require any other information like probability distributions. Traditional RSFS methods extract the information only from the diagnosed samples. Therefore, they usually require a large number of diagnosed samples to achieve the good feature selection performance. However, in many real medical applications, diagnosed samples are limited, yet the number of undiagnosed samples is large. Motivated by semi-supervised learning methodology, in this paper, we propose a novel RSFS method which can learn from both diagnosed and undiagnosed samples. This method is called undiagnosed samples aided rough set feature selection (USA-RSFS). Its main benefit is to reduce the requirement on diagnosed samples by the help of undiagnosed ones. Finally, the promising performance of USA-RSFS is validated through a set of experiments on medical datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing

自引率

0.00%

发文量