Generalizability of lesion detection and segmentation when ScaleNAS is trained on a large multi-organ dataset and validated in the liver

IF 3.2 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Medical physics Pub Date : 2024-11-22 DOI:10.1002/mp.17504
Jingchen Ma, Hao Yang, Yen Chou, Jin Yoon, Tavis Allison, Ravikumar Komandur, Jon McDunn, Asba Tasneem, Richard K. Do, Lawrence H Schwartz, Binsheng Zhao
{"title":"Generalizability of lesion detection and segmentation when ScaleNAS is trained on a large multi-organ dataset and validated in the liver","authors":"Jingchen Ma,&nbsp;Hao Yang,&nbsp;Yen Chou,&nbsp;Jin Yoon,&nbsp;Tavis Allison,&nbsp;Ravikumar Komandur,&nbsp;Jon McDunn,&nbsp;Asba Tasneem,&nbsp;Richard K. Do,&nbsp;Lawrence H Schwartz,&nbsp;Binsheng Zhao","doi":"10.1002/mp.17504","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Tumor assessment through imaging is crucial for diagnosing and treating cancer. Lesions in the liver, a common site for metastatic disease, are particularly challenging to accurately detect and segment. This labor-intensive task is subject to individual variation, which drives interest in automation using artificial intelligence (AI).</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>Evaluate AI for lesion detection and lesion segmentation using CT in the context of human performance on the same task. Use internal testing to determine how an AI-developed model (ScaleNAS) trained on lesions in multiple organs performs when tested specifically on liver lesions in a dataset integrating real-world and clinical trial data. Use external testing to evaluate whether ScaleNAS's performance generalizes to publicly available colorectal liver metastases (CRLM) from The Cancer Imaging Archive (TCIA).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>The CUPA study dataset included patients whose CT scan of chest, abdomen, or pelvis at <b>C</b>olumbia <b>U</b>niversity between 2010–2020 indicated solid tumors (CUIMC, <i>n</i> = 5011) and from two clinical trials in metastatic colorectal cancer, <b>P</b>RIME (<i>n</i> = 1183) and <b>A</b>mgen (<i>n</i> = 463). Inclusion required ≥1 measurable lesion; exclusion criteria eliminated 1566 patients. Data were divided at the patient level into training (<i>n</i> = 3996), validation (<i>n</i> = 570), and testing (<i>n</i> = 1529) sets. To create the reference standard for training and validation, each case was annotated by one of six radiologists, randomly assigned, who marked the CUPA lesions without access to any previous annotations. For internal testing we refined the CUPA test set to contain only patients who had liver lesions (<i>n</i> = 525) and formed an enhanced reference standard through expert consensus reviewing prior annotations. For external testing, TCIA-CRLM (<i>n</i> = 197) formed the test set. The reference standard for TCIA-CRLM was formed by consensus review of the original annotation and contours by two new radiologists. Metrics for lesion detection were sensitivity and false positives. Lesion segmentation was assessed with median Dice coefficient, under-segmentation ratio (USR), and over-segmentation ratio (OSR). Subgroup analysis examined the influence of lesion size ≥ 10  mm (measurable by RECIST1.1) versus all lesions (important for early identification of disease progression).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>ScaleNAS trained on all lesions achieved sensitivity of 71.4% and Dice of 70.2% for liver lesions in the CUPA internal test set (3,495 lesions) and sensitivity of 68.2% and Dice 64.2% in the TCIA-CRLM external test set (638 lesions). Human radiologists had mean sensitivity of 53.5% and Dice of 73.9% in CUPA and sensitivity of 84.1% and Dice of 88.4% in TCIA-CRLM. Performance improved for ScaleNAS and radiologists in the subgroup of lesions that excluded sub-centimeter lesions.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Our study presents the first evaluation of ScaleNAS in medical imaging, demonstrating its liver lesion detection and segmentation performance across diverse datasets. Using consensus reference standards from multiple radiologists, we addressed inter-observer variability and contributed to consistency in lesion annotation. While ScaleNAS does not surpass radiologists in performance, it offers fast and reliable results with potential utility in providing initial contours for radiologists. Future work will extend this model to lung and lymph node lesions, ultimately aiming to enhance clinical applications by generalizing detection and segmentation across tissue types.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 2","pages":"1005-1018"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17504","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Tumor assessment through imaging is crucial for diagnosing and treating cancer. Lesions in the liver, a common site for metastatic disease, are particularly challenging to accurately detect and segment. This labor-intensive task is subject to individual variation, which drives interest in automation using artificial intelligence (AI).

Purpose

Evaluate AI for lesion detection and lesion segmentation using CT in the context of human performance on the same task. Use internal testing to determine how an AI-developed model (ScaleNAS) trained on lesions in multiple organs performs when tested specifically on liver lesions in a dataset integrating real-world and clinical trial data. Use external testing to evaluate whether ScaleNAS's performance generalizes to publicly available colorectal liver metastases (CRLM) from The Cancer Imaging Archive (TCIA).

Methods

The CUPA study dataset included patients whose CT scan of chest, abdomen, or pelvis at Columbia University between 2010–2020 indicated solid tumors (CUIMC, n = 5011) and from two clinical trials in metastatic colorectal cancer, PRIME (n = 1183) and Amgen (n = 463). Inclusion required ≥1 measurable lesion; exclusion criteria eliminated 1566 patients. Data were divided at the patient level into training (n = 3996), validation (n = 570), and testing (n = 1529) sets. To create the reference standard for training and validation, each case was annotated by one of six radiologists, randomly assigned, who marked the CUPA lesions without access to any previous annotations. For internal testing we refined the CUPA test set to contain only patients who had liver lesions (n = 525) and formed an enhanced reference standard through expert consensus reviewing prior annotations. For external testing, TCIA-CRLM (n = 197) formed the test set. The reference standard for TCIA-CRLM was formed by consensus review of the original annotation and contours by two new radiologists. Metrics for lesion detection were sensitivity and false positives. Lesion segmentation was assessed with median Dice coefficient, under-segmentation ratio (USR), and over-segmentation ratio (OSR). Subgroup analysis examined the influence of lesion size ≥ 10  mm (measurable by RECIST1.1) versus all lesions (important for early identification of disease progression).

Results

ScaleNAS trained on all lesions achieved sensitivity of 71.4% and Dice of 70.2% for liver lesions in the CUPA internal test set (3,495 lesions) and sensitivity of 68.2% and Dice 64.2% in the TCIA-CRLM external test set (638 lesions). Human radiologists had mean sensitivity of 53.5% and Dice of 73.9% in CUPA and sensitivity of 84.1% and Dice of 88.4% in TCIA-CRLM. Performance improved for ScaleNAS and radiologists in the subgroup of lesions that excluded sub-centimeter lesions.

Conclusions

Our study presents the first evaluation of ScaleNAS in medical imaging, demonstrating its liver lesion detection and segmentation performance across diverse datasets. Using consensus reference standards from multiple radiologists, we addressed inter-observer variability and contributed to consistency in lesion annotation. While ScaleNAS does not surpass radiologists in performance, it offers fast and reliable results with potential utility in providing initial contours for radiologists. Future work will extend this model to lung and lymph node lesions, ultimately aiming to enhance clinical applications by generalizing detection and segmentation across tissue types.

在大型多器官数据集上训练 ScaleNAS,并在肝脏中验证其病变检测和分割的通用性。
背景:通过成像评估肿瘤对诊断和治疗癌症至关重要。肝脏是转移性疾病的常见部位,其病变尤其难以准确检测和分割。这项劳动密集型任务受个体差异的影响,这促使人们对使用人工智能(AI)实现自动化产生兴趣。目的:根据人类在相同任务中的表现,评估使用 CT 进行病变检测和病变分割的人工智能。通过内部测试来确定人工智能开发的模型(ScaleNAS)在多个器官的病变上经过训练后,在一个整合了真实世界和临床试验数据的数据集中专门针对肝脏病变进行测试时的表现。使用外部测试评估 ScaleNAS 的性能是否适用于癌症成像档案(TCIA)中公开的结直肠肝转移(CRLM):CUPA研究数据集包括2010-2020年间哥伦比亚大学胸部、腹部或盆腔CT扫描显示为实体瘤的患者(CUIMC,n = 5011),以及来自PRIME(n = 1183)和Amgen(n = 463)两项转移性结直肠癌临床试验的患者。纳入标准要求≥1个可测量病灶;排除标准则剔除了1566名患者。数据在患者层面分为训练集(n = 3996)、验证集(n = 570)和测试集(n = 1529)。为了创建训练和验证的参考标准,每个病例都由随机分配的六位放射科医生中的一位进行注释,他们在不接触任何先前注释的情况下标记 CUPA 病变。在内部测试中,我们改进了 CUPA 测试集,使其仅包含肝脏病变的患者(n = 525),并通过专家对先前注释的审查达成共识,形成了一个增强的参考标准。在外部测试中,TCIA-CRLM(n = 197)构成了测试集。TCIA-CRLM 的参考标准是由两名新的放射科专家对原始注释和轮廓进行一致审查后形成的。病变检测的指标是灵敏度和误报率。病灶分割用中位狄斯系数、分割不足率(USR)和分割过度率(OSR)进行评估。分组分析研究了病灶大小≥10毫米(RECIST1.1可测量)与所有病灶(对早期识别疾病进展很重要)的影响:在CUPA内部测试集(3,495个病灶)中,对所有病灶进行训练的ScaleNAS对肝脏病灶的灵敏度为71.4%,Dice为70.2%;在TCIA-CRLM外部测试集(638个病灶)中,灵敏度为68.2%,Dice为64.2%。人类放射医师在 CUPA 中的平均灵敏度为 53.5%,Dice 为 73.9%,在 TCIA-CRLM 中的平均灵敏度为 84.1%,Dice 为 88.4%。在排除亚厘米病变的病变亚组中,ScaleNAS和放射医师的性能均有所提高:我们的研究首次评估了 ScaleNAS 在医学影像领域的应用,证明了它在不同数据集上的肝脏病变检测和分割性能。通过使用来自多位放射科医生的共识参考标准,我们解决了观察者之间的差异,并促进了病变注释的一致性。虽然 ScaleNAS 的性能没有超过放射科医生,但它提供了快速可靠的结果,在为放射科医生提供初始轮廓方面具有潜在的实用性。未来的工作将把这一模型扩展到肺部和淋巴结病变,最终目标是通过跨组织类型的通用检测和分割来提高临床应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Medical physics
Medical physics 医学-核医学
CiteScore
6.80
自引率
15.80%
发文量
660
审稿时长
1.7 months
期刊介绍: Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信