Association rule based similarity measures for the clustering of gene expression data.

The open medical informatics journal Pub Date : 2010-01-01 Epub Date: 2010-05-28 DOI:10.2174/1874431101004010063
Prerna Sethi, Sathya Alagiriswamy
{"title":"Association rule based similarity measures for the clustering of gene expression data.","authors":"Prerna Sethi,&nbsp;Sathya Alagiriswamy","doi":"10.2174/1874431101004010063","DOIUrl":null,"url":null,"abstract":"<p><p>In life threatening diseases, such as cancer, where the effective diagnosis includes annotation, early detection, distinction, and prediction, data mining and statistical approaches offer the promise for precise, accurate, and functionally robust analysis of gene expression data. The computational extraction of derived patterns from microarray gene expression is a non-trivial task that involves sophisticated algorithm design and analysis for specific domain discovery. In this paper, we have proposed a formal approach for feature extraction by first applying feature selection heuristics based on the statistical impurity measures, the Gini Index, Max Minority, and the Twoing Rule and obtaining the top 100-400 genes. We then analyze the associative dependencies between the genes and assign weights to the genes based on their degree of participation in the rules. Consequently, we present a weighted Jaccard and vector cosine similarity measure to compute the similarity between the discovered rules. Finally, we group the rules by applying hierarchical clustering. To demonstrate the usability and efficiency of the concept of our technique, we applied it to three publicly available, multiclass cancer gene expression datasets and performed a biomedical literature search to support the effectiveness of our results.</p>","PeriodicalId":88331,"journal":{"name":"The open medical informatics journal","volume":" ","pages":"63-73"},"PeriodicalIF":0.0000,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2174/1874431101004010063","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The open medical informatics journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1874431101004010063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2010/5/28 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

In life threatening diseases, such as cancer, where the effective diagnosis includes annotation, early detection, distinction, and prediction, data mining and statistical approaches offer the promise for precise, accurate, and functionally robust analysis of gene expression data. The computational extraction of derived patterns from microarray gene expression is a non-trivial task that involves sophisticated algorithm design and analysis for specific domain discovery. In this paper, we have proposed a formal approach for feature extraction by first applying feature selection heuristics based on the statistical impurity measures, the Gini Index, Max Minority, and the Twoing Rule and obtaining the top 100-400 genes. We then analyze the associative dependencies between the genes and assign weights to the genes based on their degree of participation in the rules. Consequently, we present a weighted Jaccard and vector cosine similarity measure to compute the similarity between the discovered rules. Finally, we group the rules by applying hierarchical clustering. To demonstrate the usability and efficiency of the concept of our technique, we applied it to three publicly available, multiclass cancer gene expression datasets and performed a biomedical literature search to support the effectiveness of our results.

Abstract Image

Abstract Image

基于关联规则的基因表达数据聚类相似性度量。
在威胁生命的疾病中,如癌症,有效的诊断包括注释、早期检测、区分和预测,数据挖掘和统计方法为精确、准确和功能强大的基因表达数据分析提供了希望。从微阵列基因表达中计算提取衍生模式是一项非常重要的任务,涉及复杂的算法设计和特定区域发现的分析。在本文中,我们提出了一种正式的特征提取方法,首先应用基于统计杂质度量、基尼指数、最大少数派和Twoing规则的特征选择启发式方法,获得前100-400个基因。然后,我们分析基因之间的关联依赖关系,并根据基因在规则中的参与程度为其分配权重。因此,我们提出了加权Jaccard和向量余弦相似度度量来计算所发现规则之间的相似度。最后,采用层次聚类对规则进行分组。为了证明我们技术概念的可用性和效率,我们将其应用于三个公开可用的多类别癌症基因表达数据集,并进行生物医学文献检索以支持我们结果的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信