针对患者反应数据集的集合基因选择的特征列表聚合方法

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) Pub Date : 2013-10-24 DOI:10.1109/IRI.2013.6642488

T. Khoshgoftaar, Randall Wald, D. Dittman, Amri Napolitano

{"title":"针对患者反应数据集的集合基因选择的特征列表聚合方法","authors":"T. Khoshgoftaar, Randall Wald, D. Dittman, Amri Napolitano","doi":"10.1109/IRI.2013.6642488","DOIUrl":null,"url":null,"abstract":"Many cancer treatments destroy healthy cells along with cancerous ones, and can leave patients fatigued and with a compromised immune system. This makes it especially important to determine whether or not a given cancer treatment will work for the patient or will just cause further harm. Recently there has been work on using gene expression profiles (DNA microarrays) to predict how a patient will respond to a cancer treatment. However, these profiles carry the problem of high dimensionality (a very large number of features (genes) per instance), thus necessitating dimension-reducing techniques such as feature (gene) selection (data preprocessing techniques from the domain of data mining to find an ideal feature set). A particularly promising subset of feature selection techniques are ensemble feature selection techniques, which perform multiple instances of feature selection and aggregate the results into a single decision. Traditionally, this is accomplished by ranking the features in each list by a metric and aggregating the ranks of each feature into a single final decision for the feature. Many forms of aggregation have been considered, both in terms of how to generate the distinct lists and how to combine the ranks from each list. However, all of these works have assumed ranks must be created perlist and then aggregated in a separate step - rather than aggregating the scores of each list directly and performing ranking only on the final list. This work compares two feature list aggregation approaches (rank-based aggregation and score-based aggregation) using the mean aggregation technique in terms of classification. We use fifteen patient response datasets along with three feature selection techniques as the basis for the ensemble feature selection, and we employ four feature subset sizes and two classifiers. Our results show that in general, the rank-based aggregation approach outperforms the score-based aggregation approach for a majority of scenarios for both classifiers. However, this is not always the case and careful consideration is required before making a decision between the two.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature list aggregation approaches for ensemble gene selection on patient response datasets\",\"authors\":\"T. Khoshgoftaar, Randall Wald, D. Dittman, Amri Napolitano\",\"doi\":\"10.1109/IRI.2013.6642488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many cancer treatments destroy healthy cells along with cancerous ones, and can leave patients fatigued and with a compromised immune system. This makes it especially important to determine whether or not a given cancer treatment will work for the patient or will just cause further harm. Recently there has been work on using gene expression profiles (DNA microarrays) to predict how a patient will respond to a cancer treatment. However, these profiles carry the problem of high dimensionality (a very large number of features (genes) per instance), thus necessitating dimension-reducing techniques such as feature (gene) selection (data preprocessing techniques from the domain of data mining to find an ideal feature set). A particularly promising subset of feature selection techniques are ensemble feature selection techniques, which perform multiple instances of feature selection and aggregate the results into a single decision. Traditionally, this is accomplished by ranking the features in each list by a metric and aggregating the ranks of each feature into a single final decision for the feature. Many forms of aggregation have been considered, both in terms of how to generate the distinct lists and how to combine the ranks from each list. However, all of these works have assumed ranks must be created perlist and then aggregated in a separate step - rather than aggregating the scores of each list directly and performing ranking only on the final list. This work compares two feature list aggregation approaches (rank-based aggregation and score-based aggregation) using the mean aggregation technique in terms of classification. We use fifteen patient response datasets along with three feature selection techniques as the basis for the ensemble feature selection, and we employ four feature subset sizes and two classifiers. Our results show that in general, the rank-based aggregation approach outperforms the score-based aggregation approach for a majority of scenarios for both classifiers. However, this is not always the case and careful consideration is required before making a decision between the two.\",\"PeriodicalId\":418492,\"journal\":{\"name\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2013.6642488\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

许多癌症治疗在破坏癌变细胞的同时也破坏健康细胞，使患者感到疲劳，免疫系统受损。因此，确定一种特定的癌症治疗方法是否对患者有效或只会造成进一步的伤害就显得尤为重要。最近有研究利用基因表达谱(DNA微阵列)来预测病人对癌症治疗的反应。然而，这些配置文件存在高维问题(每个实例有非常多的特征(基因))，因此需要诸如特征(基因)选择(来自数据挖掘领域的数据预处理技术，以找到理想的特征集)之类的降维技术。特征选择技术的一个特别有前途的子集是集成特征选择技术，它执行多个特征选择实例并将结果聚合为单个决策。传统上，这是通过按指标对每个列表中的特征进行排名，并将每个特征的排名汇总为单个特征的最终决策来完成的。考虑了许多形式的聚合，包括如何生成不同的列表以及如何组合每个列表中的排名。然而，所有这些作品都假设必须先创建排名，然后在单独的步骤中进行汇总，而不是直接汇总每个列表的分数，然后只在最终列表中进行排名。这项工作比较了两种特征列表聚合方法(基于排名的聚合和基于分数的聚合)在分类方面使用平均聚合技术。我们使用15个患者反应数据集以及3种特征选择技术作为集成特征选择的基础，我们使用了4个特征子集大小和2个分类器。我们的结果表明，一般来说，在大多数情况下，基于排名的聚合方法优于基于分数的聚合方法。然而，情况并非总是如此，在两者之间做出决定之前需要仔细考虑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feature list aggregation approaches for ensemble gene selection on patient response datasets

Many cancer treatments destroy healthy cells along with cancerous ones, and can leave patients fatigued and with a compromised immune system. This makes it especially important to determine whether or not a given cancer treatment will work for the patient or will just cause further harm. Recently there has been work on using gene expression profiles (DNA microarrays) to predict how a patient will respond to a cancer treatment. However, these profiles carry the problem of high dimensionality (a very large number of features (genes) per instance), thus necessitating dimension-reducing techniques such as feature (gene) selection (data preprocessing techniques from the domain of data mining to find an ideal feature set). A particularly promising subset of feature selection techniques are ensemble feature selection techniques, which perform multiple instances of feature selection and aggregate the results into a single decision. Traditionally, this is accomplished by ranking the features in each list by a metric and aggregating the ranks of each feature into a single final decision for the feature. Many forms of aggregation have been considered, both in terms of how to generate the distinct lists and how to combine the ranks from each list. However, all of these works have assumed ranks must be created perlist and then aggregated in a separate step - rather than aggregating the scores of each list directly and performing ranking only on the final list. This work compares two feature list aggregation approaches (rank-based aggregation and score-based aggregation) using the mean aggregation technique in terms of classification. We use fifteen patient response datasets along with three feature selection techniques as the basis for the ensemble feature selection, and we employ four feature subset sizes and two classifiers. Our results show that in general, the rank-based aggregation approach outperforms the score-based aggregation approach for a majority of scenarios for both classifiers. However, this is not always the case and careful consideration is required before making a decision between the two.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)

自引率

0.00%

发文量