An optimized cluster validity index for identification of cancer mediating genes

IF 3 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Subir Hazra, Anupam Ghosh
{"title":"An optimized cluster validity index for identification of cancer mediating genes","authors":"Subir Hazra, Anupam Ghosh","doi":"10.1007/s11042-024-20105-1","DOIUrl":null,"url":null,"abstract":"<p>One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20105-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.

Abstract Image

用于识别癌症介导基因的优化聚类有效性指数
生物信息学面临的主要挑战之一是识别患者因疾病而改变的基因表达。迄今为止,人们一直在对此类识别进行重点研究,并提出了以基因表达聚类为核心的多项建议。此外,虽然聚类被证明是划分受影响基因表达向量的有效方法,但全球范围内一直在研究如何优化聚类间基因表达变化的聚类计数。本研究提出了一种名为均值-最大值指数(MMI)的新指标来确定聚类数,该指标可根据基因表达变化将数据集划分为理想的聚类数。MMI 的工作原理是将聚类内成员间的差异最小化,而将聚类间的差异最大化。在这方面,研究是在公开可用的数据集上进行的,该数据集包括肺病、白血病和结肠癌三种疾病的基因表达。肺病患者的正常和患病数据分别为 10 和 86,白血病患者的正常和患病数据分别为 43 和 13,结肠癌患者的正常和患病数据分别为 18 和 18。三种疾病的基因表达向量分别为 7129、22283 和 6600。本研究使用了三种聚类模型,分别是 K-均值聚类、围绕 Medoid 的分区聚类和模糊 C-均值聚类,所有模型都使用了建议的 MMI 技术来最终确定聚类数量。对比分析表明,与其他聚类有效性指数相比,所提出的 MMI 指数能够识别出更多的真阳性(生物富集)癌症介导基因,其准确率提高了 85%,可谓优于其他指数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Multimedia Tools and Applications
Multimedia Tools and Applications 工程技术-工程:电子与电气
CiteScore
7.20
自引率
16.70%
发文量
2439
审稿时长
9.2 months
期刊介绍: Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed. Specific areas of interest include: - Multimedia Tools: - Multimedia Applications: - Prototype multimedia systems and platforms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信