用于识别癌症介导基因的优化聚类有效性指数

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications Pub Date : 2024-09-09 DOI:10.1007/s11042-024-20105-1

Subir Hazra, Anupam Ghosh

{"title":"用于识别癌症介导基因的优化聚类有效性指数","authors":"Subir Hazra, Anupam Ghosh","doi":"10.1007/s11042-024-20105-1","DOIUrl":null,"url":null,"abstract":"<p>One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"407 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An optimized cluster validity index for identification of cancer mediating genes\",\"authors\":\"Subir Hazra, Anupam Ghosh\",\"doi\":\"10.1007/s11042-024-20105-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.</p>\",\"PeriodicalId\":18770,\"journal\":{\"name\":\"Multimedia Tools and Applications\",\"volume\":\"407 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Tools and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11042-024-20105-1\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Tools and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11042-024-20105-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

生物信息学面临的主要挑战之一是识别患者因疾病而改变的基因表达。迄今为止，人们一直在对此类识别进行重点研究，并提出了以基因表达聚类为核心的多项建议。此外，虽然聚类被证明是划分受影响基因表达向量的有效方法，但全球范围内一直在研究如何优化聚类间基因表达变化的聚类计数。本研究提出了一种名为均值-最大值指数（MMI）的新指标来确定聚类数，该指标可根据基因表达变化将数据集划分为理想的聚类数。MMI 的工作原理是将聚类内成员间的差异最小化，而将聚类间的差异最大化。在这方面，研究是在公开可用的数据集上进行的，该数据集包括肺病、白血病和结肠癌三种疾病的基因表达。肺病患者的正常和患病数据分别为 10 和 86，白血病患者的正常和患病数据分别为 43 和 13，结肠癌患者的正常和患病数据分别为 18 和 18。三种疾病的基因表达向量分别为 7129、22283 和 6600。本研究使用了三种聚类模型，分别是 K-均值聚类、围绕 Medoid 的分区聚类和模糊 C-均值聚类，所有模型都使用了建议的 MMI 技术来最终确定聚类数量。对比分析表明，与其他聚类有效性指数相比，所提出的 MMI 指数能够识别出更多的真阳性（生物富集）癌症介导基因，其准确率提高了 85%，可谓优于其他指数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

An optimized cluster validity index for identification of cancer mediating genes

查看原文本刊更多论文

An optimized cluster validity index for identification of cancer mediating genes

One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Multimedia Tools and Applications 工程技术-工程：电子与电气

CiteScore

7.20

自引率

16.70%

发文量

2439

审稿时长

9.2 months

期刊介绍： Multimedia Tools and Applications publishes original research articles on multimedia development and system support tools as well as case studies of multimedia applications. It also features experimental and survey articles. The journal is intended for academics, practitioners, scientists and engineers who are involved in multimedia system research, design and applications. All papers are peer reviewed. Specific areas of interest include: - Multimedia Tools: - Multimedia Applications: - Prototype multimedia systems and platforms