Effects and methodology for grid subdivision of CT-based texture for unsupervised clustering

Symposium on Medical Information Processing and Analysis Pub Date : 2023-03-06 DOI:10.1117/12.2670140

K. M. Cunanan, B. Varghese, Yenlin Lee, Raghda Abouelnaga, Ramin Eghtesadi, D. Hwang, V. Duddalwar, S. Cen

{"title":"Effects and methodology for grid subdivision of CT-based texture for unsupervised clustering","authors":"K. M. Cunanan, B. Varghese, Yenlin Lee, Raghda Abouelnaga, Ramin Eghtesadi, D. Hwang, V. Duddalwar, S. Cen","doi":"10.1117/12.2670140","DOIUrl":null,"url":null,"abstract":"t-Distributed Stochastic Neighbor Embedding (t-SNE) and k-means have been increasingly utilized for dimension reduction and graphical illustration in medical imaging (e.g., CT) informatics. Mapping a grid network onto a slide is a prerequisite for implementing cluster analysis. Traditionally, the performance of cluster analysis is driven by hyperparameters, however, grid size which also affects performance is often set arbitrarily. In this study, we evaluated the effect of varying grid sizes, perplexity and learning rate hyperparameters for unsupervised clustering using CT images of renal masses. We investigated the impact of grid size to cluster analysis. The number of clusters was determined by Gap-statistics. The grid size selections were 2x2, 4x4, 5x5, and 8x8. The results showed that the number of output clusters increased with decreasing grid sizes from 8x8 to 4x4. However, when grid size reached 2x2, the model yielded the same cluster number as 8x8. This finding was consistent across different hyperparameter settings. Additional analyses were conducted to understand the nesting structure between the cluster membership (the mutually exclusive cluster number assigned to each grid in a cluster analysis) from large (8x8) grid and small (2x2) grid, although both grid size selections yielded the same number of clusters. We report that the cluster membership between large grid and small grid is only partially overlaid. This suggests that additional pattern/information is detected by using the small grid. In conclusion, the grid size should be treated as another hyperparameter when using unsupervised clustering methods for pattern recognition in medical imaging analysis.","PeriodicalId":147201,"journal":{"name":"Symposium on Medical Information Processing and Analysis","volume":"12567 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Medical Information Processing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2670140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

t-Distributed Stochastic Neighbor Embedding (t-SNE) and k-means have been increasingly utilized for dimension reduction and graphical illustration in medical imaging (e.g., CT) informatics. Mapping a grid network onto a slide is a prerequisite for implementing cluster analysis. Traditionally, the performance of cluster analysis is driven by hyperparameters, however, grid size which also affects performance is often set arbitrarily. In this study, we evaluated the effect of varying grid sizes, perplexity and learning rate hyperparameters for unsupervised clustering using CT images of renal masses. We investigated the impact of grid size to cluster analysis. The number of clusters was determined by Gap-statistics. The grid size selections were 2x2, 4x4, 5x5, and 8x8. The results showed that the number of output clusters increased with decreasing grid sizes from 8x8 to 4x4. However, when grid size reached 2x2, the model yielded the same cluster number as 8x8. This finding was consistent across different hyperparameter settings. Additional analyses were conducted to understand the nesting structure between the cluster membership (the mutually exclusive cluster number assigned to each grid in a cluster analysis) from large (8x8) grid and small (2x2) grid, although both grid size selections yielded the same number of clusters. We report that the cluster membership between large grid and small grid is only partially overlaid. This suggests that additional pattern/information is detected by using the small grid. In conclusion, the grid size should be treated as another hyperparameter when using unsupervised clustering methods for pattern recognition in medical imaging analysis.

查看原文本刊更多论文

基于ct的纹理网格细分在无监督聚类中的效果与方法

t分布随机邻居嵌入(t-SNE)和k-means已越来越多地用于医学成像(如CT)信息学中的降维和图形说明。将网格网络映射到幻灯片上是实现聚类分析的先决条件。传统上，聚类分析的性能是由超参数驱动的，而网格大小的设置往往是任意的，网格大小也会影响聚类分析的性能。在这项研究中，我们利用肾脏肿块的CT图像评估了不同网格大小、困惑度和学习率超参数对无监督聚类的影响。我们研究了网格大小对聚类分析的影响。聚类的数量由gap统计决定。网格大小选择为2x2、4x4、5x5和8x8。结果表明，随着网格尺寸从8x8减小到4x4，输出簇的数量增加。然而，当网格大小达到2x2时，该模型得到的簇数与8x8相同。这一发现在不同的超参数设置中是一致的。另外还进行了分析，以了解大(8x8)网格和小(2x2)网格的集群成员(在聚类分析中分配给每个网格的互斥集群编号)之间的嵌套结构，尽管这两种网格大小选择产生的集群数量相同。我们报告了大网格和小网格之间的集群隶属关系只是部分覆盖。这表明使用小网格可以检测到额外的模式/信息。综上所述，在医学影像分析中使用无监督聚类方法进行模式识别时，网格大小应作为另一个超参数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Symposium on Medical Information Processing and Analysis

自引率

0.00%

发文量