在极度不平衡的数据中,可解释的机器学习预测滤泡性甲状腺肿瘤的恶性风险:回顾性队列研究和文献综述。

IF 3.3 Q2 ONCOLOGY
JMIR Cancer Pub Date : 2025-02-10 DOI:10.2196/66269
Rui Shan, Xin Li, Jing Chen, Zheng Chen, Yuan-Jia Cheng, Bo Han, Run-Ze Hu, Jiu-Ping Huang, Gui-Lan Kong, Hui Liu, Fang Mei, Shi-Bing Song, Bang-Kai Sun, Hui Tian, Yang Wang, Wu-Cai Xiao, Xiang-Yun Yao, Jing-Ming Ye, Bo Yu, Chun-Hui Yuan, Fan Zhang, Zheng Liu
{"title":"在极度不平衡的数据中,可解释的机器学习预测滤泡性甲状腺肿瘤的恶性风险:回顾性队列研究和文献综述。","authors":"Rui Shan, Xin Li, Jing Chen, Zheng Chen, Yuan-Jia Cheng, Bo Han, Run-Ze Hu, Jiu-Ping Huang, Gui-Lan Kong, Hui Liu, Fang Mei, Shi-Bing Song, Bang-Kai Sun, Hui Tian, Yang Wang, Wu-Cai Xiao, Xiang-Yun Yao, Jing-Ming Ye, Bo Yu, Chun-Hui Yuan, Fan Zhang, Zheng Liu","doi":"10.2196/66269","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Diagnosing and managing follicular thyroid neoplasms (FTNs) remains a significant challenge, as the malignancy risk cannot be determined until after diagnostic surgery.</p><p><strong>Objective: </strong>We aimed to use interpretable machine learning to predict the malignancy risk of FTNs preoperatively in a real-world setting.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study at the Peking University Third Hospital in Beijing, China. Patients with postoperative pathological diagnoses of follicular thyroid adenoma (FTA) or follicular thyroid carcinoma (FTC) were included, excluding those without preoperative thyroid ultrasonography. We used 22 predictors involving demographic characteristics, thyroid sonography, and hormones to train 5 machine learning models: logistic regression, least absolute shrinkage and selection operator regression, random forest, extreme gradient boosting, and support vector machine. The optimal model was selected based on discrimination, calibration, interpretability, and parsimony. To address the highly imbalanced data (FTA:FTC ratio>5:1), model discrimination was assessed using both the area under the receiver operating characteristic curve and the area under the precision-recall curve (AUPRC). To interpret the model, we used Shapley Additive Explanations values and partial dependence and individual conditional expectation plots. Additionally, a systematic review was performed to synthesize existing evidence and validate the discrimination ability of the previously developed Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria to differentiate between benign and malignant FTNs using our data.</p><p><strong>Results: </strong>The cohort included 1539 patients (mean age 47.98, SD 14.15 years; female: n=1126, 73.16%) with 1672 FTN tumors (FTA: n=1414; FTC: n=258; FTA:FTC ratio=5.5). The random forest model emerged as optimal, identifying mean thyroid-stimulating hormone (TSH) score, mean tumor diameter, mean TSH, TSH instability, and TSH measurement levels as the top 5 predictors in discriminating FTA from FTC, with the area under the receiver operating characteristic curve of 0.79 (95% CI 0.77-0.81) and AUPRC of 0.40 (95% CI 0.37-0.44). Malignancy risk increased nonlinearly with larger tumor diameters and higher TSH instability but decreased nonlinearly with higher mean TSH scores or mean TSH levels. FTCs with small sizes (mean diameter 2.88, SD 1.38 cm) were more likely to be misclassified as FTAs compared to larger ones (mean diameter 3.71, SD 1.36 cm). The systematic review of the 7 included studies revealed that (1) the FTA:FTC ratio varied from 0.6 to 4.0, lower than the natural distribution of 5.0; (2) no studies assessed prediction performance using AUPRC in unbalanced datasets; and (3) external validations of Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria underperformed relative to the original study.</p><p><strong>Conclusions: </strong>Tumor size and TSH measurements were important in screening FTN malignancy risk preoperatively, but accurately predicting the risk of small-sized FTNs remains challenging. Future research should address the limitations posed by the extreme imbalance in FTA and FTC distributions in real-world data.</p>","PeriodicalId":45538,"journal":{"name":"JMIR Cancer","volume":"11 ","pages":"e66269"},"PeriodicalIF":3.3000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833187/pdf/","citationCount":"0","resultStr":"{\"title\":\"Interpretable Machine Learning to Predict the Malignancy Risk of Follicular Thyroid Neoplasms in Extremely Unbalanced Data: Retrospective Cohort Study and Literature Review.\",\"authors\":\"Rui Shan, Xin Li, Jing Chen, Zheng Chen, Yuan-Jia Cheng, Bo Han, Run-Ze Hu, Jiu-Ping Huang, Gui-Lan Kong, Hui Liu, Fang Mei, Shi-Bing Song, Bang-Kai Sun, Hui Tian, Yang Wang, Wu-Cai Xiao, Xiang-Yun Yao, Jing-Ming Ye, Bo Yu, Chun-Hui Yuan, Fan Zhang, Zheng Liu\",\"doi\":\"10.2196/66269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Diagnosing and managing follicular thyroid neoplasms (FTNs) remains a significant challenge, as the malignancy risk cannot be determined until after diagnostic surgery.</p><p><strong>Objective: </strong>We aimed to use interpretable machine learning to predict the malignancy risk of FTNs preoperatively in a real-world setting.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study at the Peking University Third Hospital in Beijing, China. Patients with postoperative pathological diagnoses of follicular thyroid adenoma (FTA) or follicular thyroid carcinoma (FTC) were included, excluding those without preoperative thyroid ultrasonography. We used 22 predictors involving demographic characteristics, thyroid sonography, and hormones to train 5 machine learning models: logistic regression, least absolute shrinkage and selection operator regression, random forest, extreme gradient boosting, and support vector machine. The optimal model was selected based on discrimination, calibration, interpretability, and parsimony. To address the highly imbalanced data (FTA:FTC ratio>5:1), model discrimination was assessed using both the area under the receiver operating characteristic curve and the area under the precision-recall curve (AUPRC). To interpret the model, we used Shapley Additive Explanations values and partial dependence and individual conditional expectation plots. Additionally, a systematic review was performed to synthesize existing evidence and validate the discrimination ability of the previously developed Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria to differentiate between benign and malignant FTNs using our data.</p><p><strong>Results: </strong>The cohort included 1539 patients (mean age 47.98, SD 14.15 years; female: n=1126, 73.16%) with 1672 FTN tumors (FTA: n=1414; FTC: n=258; FTA:FTC ratio=5.5). The random forest model emerged as optimal, identifying mean thyroid-stimulating hormone (TSH) score, mean tumor diameter, mean TSH, TSH instability, and TSH measurement levels as the top 5 predictors in discriminating FTA from FTC, with the area under the receiver operating characteristic curve of 0.79 (95% CI 0.77-0.81) and AUPRC of 0.40 (95% CI 0.37-0.44). Malignancy risk increased nonlinearly with larger tumor diameters and higher TSH instability but decreased nonlinearly with higher mean TSH scores or mean TSH levels. FTCs with small sizes (mean diameter 2.88, SD 1.38 cm) were more likely to be misclassified as FTAs compared to larger ones (mean diameter 3.71, SD 1.36 cm). The systematic review of the 7 included studies revealed that (1) the FTA:FTC ratio varied from 0.6 to 4.0, lower than the natural distribution of 5.0; (2) no studies assessed prediction performance using AUPRC in unbalanced datasets; and (3) external validations of Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria underperformed relative to the original study.</p><p><strong>Conclusions: </strong>Tumor size and TSH measurements were important in screening FTN malignancy risk preoperatively, but accurately predicting the risk of small-sized FTNs remains challenging. Future research should address the limitations posed by the extreme imbalance in FTA and FTC distributions in real-world data.</p>\",\"PeriodicalId\":45538,\"journal\":{\"name\":\"JMIR Cancer\",\"volume\":\"11 \",\"pages\":\"e66269\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833187/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Cancer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/66269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Cancer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/66269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:诊断和治疗滤泡性甲状腺肿瘤(ftn)仍然是一个重大的挑战,因为恶性肿瘤的风险不能确定,直到诊断手术后。目的:我们旨在使用可解释的机器学习在现实世界中预测FTNs术前的恶性风险。方法:我们在北京大学第三医院进行了一项回顾性队列研究。纳入术后病理诊断为滤泡性甲状腺腺瘤(FTA)或滤泡性甲状腺癌(FTC)的患者,不包括术前未做甲状腺超声检查的患者。我们使用涉及人口统计学特征、甲状腺超声检查和激素的22个预测因子来训练5个机器学习模型:逻辑回归、最小绝对收缩和选择算子回归、随机森林、极端梯度增强和支持向量机。基于识别性、校准性、可解释性和简洁性选择最优模型。为了解决高度不平衡的数据(FTA:FTC比率>5:1),使用接收者工作特征曲线下的面积和精确召回率曲线下的面积(AUPRC)来评估模型判别。为了解释模型,我们使用了Shapley加性解释值、部分依赖和个体条件期望图。此外,我们还进行了一项系统综述,以综合现有证据,并利用我们的数据验证先前开发的甲状腺影像报告和滤泡肿瘤评分标准数据系统区分良性和恶性ftn的能力。结果:纳入1539例患者,平均年龄47.98岁,SD 14.15岁;女性:n=1126, 73.16%), FTN肿瘤1672例(FTA: n=1414;联邦贸易委员会:n = 258;自由贸易协定:FTC比= 5.5)。随机森林模型是最优的,发现平均促甲状腺激素(TSH)评分、平均肿瘤直径、平均TSH、TSH不稳定性和TSH测量水平是区分FTA和FTC的前5个预测因子,受试者工作特征曲线下面积为0.79 (95% CI 0.77-0.81), AUPRC为0.40 (95% CI 0.37-0.44)。肿瘤直径越大,TSH不稳定性越高,恶性风险呈非线性增加,但TSH平均评分或TSH平均水平越高,恶性风险呈非线性降低。较小的FTCs(平均直径2.88,SD 1.38 cm)比较大的FTCs(平均直径3.71,SD 1.36 cm)更容易被误分类为FTAs。对纳入的7项研究进行系统评价发现:(1)FTA:FTC比值在0.6 ~ 4.0之间变化,低于5.0的自然分布;(2)没有研究评估AUPRC在非平衡数据集上的预测性能;(3)与原始研究相比,甲状腺影像报告和数据系统对滤泡性肿瘤评分标准的外部验证表现不佳。结论:肿瘤大小和TSH测量在术前筛查FTN恶性风险中很重要,但准确预测小型FTN的风险仍然具有挑战性。未来的研究应该解决现实数据中FTA和FTC分布极度不平衡所带来的局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interpretable Machine Learning to Predict the Malignancy Risk of Follicular Thyroid Neoplasms in Extremely Unbalanced Data: Retrospective Cohort Study and Literature Review.

Background: Diagnosing and managing follicular thyroid neoplasms (FTNs) remains a significant challenge, as the malignancy risk cannot be determined until after diagnostic surgery.

Objective: We aimed to use interpretable machine learning to predict the malignancy risk of FTNs preoperatively in a real-world setting.

Methods: We conducted a retrospective cohort study at the Peking University Third Hospital in Beijing, China. Patients with postoperative pathological diagnoses of follicular thyroid adenoma (FTA) or follicular thyroid carcinoma (FTC) were included, excluding those without preoperative thyroid ultrasonography. We used 22 predictors involving demographic characteristics, thyroid sonography, and hormones to train 5 machine learning models: logistic regression, least absolute shrinkage and selection operator regression, random forest, extreme gradient boosting, and support vector machine. The optimal model was selected based on discrimination, calibration, interpretability, and parsimony. To address the highly imbalanced data (FTA:FTC ratio>5:1), model discrimination was assessed using both the area under the receiver operating characteristic curve and the area under the precision-recall curve (AUPRC). To interpret the model, we used Shapley Additive Explanations values and partial dependence and individual conditional expectation plots. Additionally, a systematic review was performed to synthesize existing evidence and validate the discrimination ability of the previously developed Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria to differentiate between benign and malignant FTNs using our data.

Results: The cohort included 1539 patients (mean age 47.98, SD 14.15 years; female: n=1126, 73.16%) with 1672 FTN tumors (FTA: n=1414; FTC: n=258; FTA:FTC ratio=5.5). The random forest model emerged as optimal, identifying mean thyroid-stimulating hormone (TSH) score, mean tumor diameter, mean TSH, TSH instability, and TSH measurement levels as the top 5 predictors in discriminating FTA from FTC, with the area under the receiver operating characteristic curve of 0.79 (95% CI 0.77-0.81) and AUPRC of 0.40 (95% CI 0.37-0.44). Malignancy risk increased nonlinearly with larger tumor diameters and higher TSH instability but decreased nonlinearly with higher mean TSH scores or mean TSH levels. FTCs with small sizes (mean diameter 2.88, SD 1.38 cm) were more likely to be misclassified as FTAs compared to larger ones (mean diameter 3.71, SD 1.36 cm). The systematic review of the 7 included studies revealed that (1) the FTA:FTC ratio varied from 0.6 to 4.0, lower than the natural distribution of 5.0; (2) no studies assessed prediction performance using AUPRC in unbalanced datasets; and (3) external validations of Thyroid Imaging Reporting and Data System for Follicular Neoplasm scoring criteria underperformed relative to the original study.

Conclusions: Tumor size and TSH measurements were important in screening FTN malignancy risk preoperatively, but accurately predicting the risk of small-sized FTNs remains challenging. Future research should address the limitations posed by the extreme imbalance in FTA and FTC distributions in real-world data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Cancer
JMIR Cancer ONCOLOGY-
CiteScore
4.10
自引率
0.00%
发文量
64
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信