Longitudinal image-based prediction of surgical intervention in infants with hydronephrosis using deep learning: Is a single ultrasound enough?

IF 7.7

PLOS digital health Pub Date : 2025-08-04 eCollection Date: 2025-08-01 DOI:10.1371/journal.pdig.0000939

Adree Khondker, Stanley Bryan Z Hua, Jethro C C Kwong, Kunj Sheth, Daniel Alvarez, Kyla N Velaer, John Weaver, Alice Xiang, Gregory E Tasian, Armando J Lorenzo, Anna Goldenberg, Mandy Rickard, Lauren Erdman

{"title":"Longitudinal image-based prediction of surgical intervention in infants with hydronephrosis using deep learning: Is a single ultrasound enough?","authors":"Adree Khondker, Stanley Bryan Z Hua, Jethro C C Kwong, Kunj Sheth, Daniel Alvarez, Kyla N Velaer, John Weaver, Alice Xiang, Gregory E Tasian, Armando J Lorenzo, Anna Goldenberg, Mandy Rickard, Lauren Erdman","doi":"10.1371/journal.pdig.0000939","DOIUrl":null,"url":null,"abstract":"<p><p>The potential of deep learning to predict renal obstruction using kidney ultrasound images has been demonstrated. However, these image-based classifiers have incorporated information using only single-visit ultrasounds. Here, we developed machine learning (ML) models incorporating ultrasounds from multiple clinic visits for hydronephrosis to generate a hydronephrosis severity index score to discriminate patients into high versus low risk for needing pyeloplasty and compare these against models trained with single clinic visit data. We included patients followed for hydronephrosis from three institutions. The outcome of interest was low risk versus high risk of obstructive hydronephrosis requiring pyeloplasty. The model was trained on data from Toronto, ON and validated on an internal holdout set, and tested on an internal prospective set and two external institutions. We developed models trained with single ultrasound (single-visit) and multi-visit models using average prediction, convolutional pooling, long-short term memory and temporal shift models. We compared model performance by area under the receiver-operator-characteristic (AUROC) and area under the precision-recall-curve (AUPRC). A total of 794 patients were included (603 SickKids, 102 Stanford, and 89 CHOP) with a pyeloplasty rate of 12%, 5%, and 67%, respectively. There was no significant difference in developing single-visit US models using the first ultrasound vs. the latest ultrasound. Comparing single-visit vs. multi-visit models, all multi-visit models fail to produce AUROC or AUPRC significantly greater than single-visit models. We developed ML models for hydronephrosis that incorporate multi-visit inference across multiple institutions but did not demonstrate superiority over single-visit inference. These results imply that the single-visit models would be sufficient in aiding accurate risk stratification from single, early ultrasound images.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 8","pages":"e0000939"},"PeriodicalIF":7.7000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321052/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The potential of deep learning to predict renal obstruction using kidney ultrasound images has been demonstrated. However, these image-based classifiers have incorporated information using only single-visit ultrasounds. Here, we developed machine learning (ML) models incorporating ultrasounds from multiple clinic visits for hydronephrosis to generate a hydronephrosis severity index score to discriminate patients into high versus low risk for needing pyeloplasty and compare these against models trained with single clinic visit data. We included patients followed for hydronephrosis from three institutions. The outcome of interest was low risk versus high risk of obstructive hydronephrosis requiring pyeloplasty. The model was trained on data from Toronto, ON and validated on an internal holdout set, and tested on an internal prospective set and two external institutions. We developed models trained with single ultrasound (single-visit) and multi-visit models using average prediction, convolutional pooling, long-short term memory and temporal shift models. We compared model performance by area under the receiver-operator-characteristic (AUROC) and area under the precision-recall-curve (AUPRC). A total of 794 patients were included (603 SickKids, 102 Stanford, and 89 CHOP) with a pyeloplasty rate of 12%, 5%, and 67%, respectively. There was no significant difference in developing single-visit US models using the first ultrasound vs. the latest ultrasound. Comparing single-visit vs. multi-visit models, all multi-visit models fail to produce AUROC or AUPRC significantly greater than single-visit models. We developed ML models for hydronephrosis that incorporate multi-visit inference across multiple institutions but did not demonstrate superiority over single-visit inference. These results imply that the single-visit models would be sufficient in aiding accurate risk stratification from single, early ultrasound images.

Abstract Image

查看原文本刊更多论文

基于纵向图像的深度学习预测婴儿肾积水手术干预：单次超声检查是否足够？

深度学习在利用肾脏超声图像预测肾梗阻方面的潜力已经得到证实。然而，这些基于图像的分类器仅使用单次超声检查纳入了信息。在这里，我们开发了机器学习（ML）模型，结合多次就诊的肾盂积水超声，生成肾盂积水严重程度指数评分，以区分需要肾盂成形术的患者的高风险和低风险，并将这些与单次就诊数据训练的模型进行比较。我们纳入了来自三家机构的肾积水患者。结果是需要肾盂成形术的阻塞性肾积水的低风险与高风险。该模型在安大略省多伦多的数据上进行了训练，并在一个内部holdout集合上进行了验证，并在一个内部前瞻性集合和两个外部机构上进行了测试。我们开发了使用平均预测、卷积池、长短期记忆和时间移位模型训练的单次超声（单次就诊）和多次就诊模型的模型。我们通过接收-操作-特征下面积（AUROC）和精度-召回曲线下面积（AUPRC）来比较模型的性能。共纳入794例患者（603例SickKids、102例Stanford和89例CHOP），肾盂成形术率分别为12%、5%和67%。使用第一次超声和最新超声开发单次访问美国模型没有显着差异。对比单次访问模型和多次访问模型，所有多次访问模型产生的AUROC或AUPRC均不显著大于单次访问模型。我们开发了肾积水的ML模型，该模型结合了跨多个机构的多次访问推理，但没有表现出优于单次访问推理的优势。这些结果表明，单次访问模型将足以帮助准确的风险分层，从单一的早期超声图像。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLOS digital health

自引率

0.00%

发文量