Unsupervised clustering using multiple correspondence analysis reveals clinically-relevant demographic variables across multiple gastrointestinal cancers

Ryan J. Kramer , Kristen E. Rhodin , Aaron Therien , Vignesh Raman , Austin Eckhoff , Camryn Thompson , Betty C. Tong , Dan G. Blazer III , Michael E. Lidsky , Thomas D’Amico , Daniel P. Nussbaum
{"title":"Unsupervised clustering using multiple correspondence analysis reveals clinically-relevant demographic variables across multiple gastrointestinal cancers","authors":"Ryan J. Kramer ,&nbsp;Kristen E. Rhodin ,&nbsp;Aaron Therien ,&nbsp;Vignesh Raman ,&nbsp;Austin Eckhoff ,&nbsp;Camryn Thompson ,&nbsp;Betty C. Tong ,&nbsp;Dan G. Blazer III ,&nbsp;Michael E. Lidsky ,&nbsp;Thomas D’Amico ,&nbsp;Daniel P. Nussbaum","doi":"10.1016/j.soi.2024.100009","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>Patients with gastrointestinal malignancies represent a heterogenous population, even among those with similar stage and treatment pathways. Here, we used dimensionality reduction in the National Cancer Database (NCDB) to inform unsupervised clustering of patients with three gastrointestinal malignancies and examined outcomes among these computationally-derived groups.</p></div><div><h3>Methods</h3><p>The NCDB was queried for three cohorts of patients receiving multimodal therapy: stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. Multiple correspondence analysis (MCA), a dimensionality reduction technique well-suited for categorical variables such as demographic data in the NCDB, was performed on this cohort with variables including demographic and tumor characteristics. Principal components were analyzed to derive clusters. Outcomes for each cluster were compared using Kaplan-Meier survival methods.</p></div><div><h3>Results</h3><p>For esophageal (n = 11,399), gastric (n = 2033), and colon (n = 72,057) cancer, the same four variables were identified as highly representative. The principal variables were income quartile, education quartile, age quartile, and insurance type. Survival analysis demonstrated significant differences in overall survival between clusters in esophageal (p &lt; 0.0001) and colon (p &lt; 0.0001) cancer, but not gastric cancer (p = 0.56). Clusters defined by high income, high education, younger age, and private insurance fared better.</p></div><div><h3>Conclusions</h3><p>Using MCA, we identified combinations of 4 demographic variables in the NCDB with stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. These groupings had significantly different survival outcomes in colon and esophageal cancer. This work serves as proof-of-concept for the utility of unsupervised clustering for outcomes research in surgical malignancies and identifies at-risk populations.</p></div>","PeriodicalId":101191,"journal":{"name":"Surgical Oncology Insight","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950247024000057/pdfft?md5=3c00e0283b85506b14944aa9afd3a079&pid=1-s2.0-S2950247024000057-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical Oncology Insight","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950247024000057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

Patients with gastrointestinal malignancies represent a heterogenous population, even among those with similar stage and treatment pathways. Here, we used dimensionality reduction in the National Cancer Database (NCDB) to inform unsupervised clustering of patients with three gastrointestinal malignancies and examined outcomes among these computationally-derived groups.

Methods

The NCDB was queried for three cohorts of patients receiving multimodal therapy: stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. Multiple correspondence analysis (MCA), a dimensionality reduction technique well-suited for categorical variables such as demographic data in the NCDB, was performed on this cohort with variables including demographic and tumor characteristics. Principal components were analyzed to derive clusters. Outcomes for each cluster were compared using Kaplan-Meier survival methods.

Results

For esophageal (n = 11,399), gastric (n = 2033), and colon (n = 72,057) cancer, the same four variables were identified as highly representative. The principal variables were income quartile, education quartile, age quartile, and insurance type. Survival analysis demonstrated significant differences in overall survival between clusters in esophageal (p < 0.0001) and colon (p < 0.0001) cancer, but not gastric cancer (p = 0.56). Clusters defined by high income, high education, younger age, and private insurance fared better.

Conclusions

Using MCA, we identified combinations of 4 demographic variables in the NCDB with stage II/III esophageal cancer, stage II/III gastric cancer, and stage III colon cancer. These groupings had significantly different survival outcomes in colon and esophageal cancer. This work serves as proof-of-concept for the utility of unsupervised clustering for outcomes research in surgical malignancies and identifies at-risk populations.

利用多重对应分析进行无监督聚类,揭示多种胃肠道癌症中与临床相关的人口统计学变量
目的胃肠道恶性肿瘤患者是一个异质性人群,即使在分期和治疗途径相似的患者中也是如此。在此,我们利用国家癌症数据库(NCDB)中的降维技术对三种胃肠道恶性肿瘤患者进行了无监督聚类,并研究了这些通过计算得出的组别之间的治疗效果。多重对应分析 (MCA) 是一种降维技术,非常适合 NCDB 中的人口统计学数据等分类变量。对主成分进行分析后得出聚类。结果对于食管癌(n = 11,399)、胃癌(n = 2033)和结肠癌(n = 72,057),同样的四个变量被确定为具有高度代表性。主要变量包括收入四分位数、教育四分位数、年龄四分位数和保险类型。生存分析表明,食管癌(p <0.0001)和结肠癌(p <0.0001)不同群组之间的总生存率存在显著差异,但胃癌(p = 0.56)不存在显著差异。结论利用 MCA,我们在国家疾病分类数据库中确定了 II/III 期食管癌、II/III 期胃癌和 III 期结肠癌的 4 个人口统计学变量组合。这些分组在结肠癌和食道癌的生存结果上有明显差异。这项工作证明了无监督聚类在外科恶性肿瘤结果研究中的实用性,并确定了高危人群。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信