{"title":"Artificial intelligence-enhanced ultrasound imaging for thyroid nodule detection and malignancy classification: a study on YOLOv11.","authors":"Jiaqi Yang, Zhigang Luo, Yanting Wen, Jing Zhang","doi":"10.21037/qims-2025-257","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Thyroid nodules are a common clinical concern, with accurate diagnosis being critical for effective treatment and improved patient outcomes. Traditional ultrasound examinations rely heavily on the physician's experience, which can lead to diagnostic variability. The integration of artificial intelligence (AI) into medical imaging offers a promising solution for enhancing diagnostic accuracy and efficiency. This study aimed to evaluate the effectiveness of the You Only Look Once v. 11 (YOLOv11) model in detecting and classifying thyroid nodules through ultrasound images, with the goal of supporting real-time clinical decision-making and improving diagnostic workflows.</p><p><strong>Methods: </strong>We used the YOLOv11 model to analyze a dataset of 1,503 thyroid ultrasound images, divided into training (1,203 images), validation (150 images), and test (150 images) sets, comprising 742 benign and 778 malignant nodules. Advanced data augmentation and transfer learning techniques were applied to optimize model performance. Comparative analysis was conducted with other YOLO variants (YOLOv3 to YOLOv10) and residual network 50 (ResNet50) to assess their diagnostic capabilities.</p><p><strong>Results: </strong>The YOLOv11 model exhibited superior performance in thyroid nodule detection as compared to other YOLO variants (from YOLOv3 to YOLOv10) and ResNet50. At an intersection over union (IoU) of 0.5, YOLOv11 achieved a precision (P) of 0.841 and recall (R) of 0.823, outperforming ResNet50's P of 0.8333 and R of 0.8025. Among the YOLO variants, YOLOv11 consistently achieved the highest P and R values. For benign nodules, YOLOv11 obtained a P of 0.835 and R of 0.833, while for malignant nodules, it reached a P of 0.846 and a R of 0.813. Within the YOLOv11 model itself, performance varied across different IoU thresholds (0.25, 0.5, 0.7, and 0.9). Lower IoU thresholds generally resulted in better performance metrics, with P and R values decreasing as the IoU threshold increased.</p><p><strong>Conclusions: </strong>YOLOv11 proved to be a powerful tool for thyroid nodule detection and malignancy classification, offering high P and real-time performance. These attributes are vital for dynamic ultrasound examinations and enhancing diagnostic efficiency. Future research will focus on expanding datasets and validating the model's clinical utility in real-time settings.</p>","PeriodicalId":54267,"journal":{"name":"Quantitative Imaging in Medicine and Surgery","volume":"15 9","pages":"7964-7976"},"PeriodicalIF":2.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12397667/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Imaging in Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/qims-2025-257","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/14 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Thyroid nodules are a common clinical concern, with accurate diagnosis being critical for effective treatment and improved patient outcomes. Traditional ultrasound examinations rely heavily on the physician's experience, which can lead to diagnostic variability. The integration of artificial intelligence (AI) into medical imaging offers a promising solution for enhancing diagnostic accuracy and efficiency. This study aimed to evaluate the effectiveness of the You Only Look Once v. 11 (YOLOv11) model in detecting and classifying thyroid nodules through ultrasound images, with the goal of supporting real-time clinical decision-making and improving diagnostic workflows.
Methods: We used the YOLOv11 model to analyze a dataset of 1,503 thyroid ultrasound images, divided into training (1,203 images), validation (150 images), and test (150 images) sets, comprising 742 benign and 778 malignant nodules. Advanced data augmentation and transfer learning techniques were applied to optimize model performance. Comparative analysis was conducted with other YOLO variants (YOLOv3 to YOLOv10) and residual network 50 (ResNet50) to assess their diagnostic capabilities.
Results: The YOLOv11 model exhibited superior performance in thyroid nodule detection as compared to other YOLO variants (from YOLOv3 to YOLOv10) and ResNet50. At an intersection over union (IoU) of 0.5, YOLOv11 achieved a precision (P) of 0.841 and recall (R) of 0.823, outperforming ResNet50's P of 0.8333 and R of 0.8025. Among the YOLO variants, YOLOv11 consistently achieved the highest P and R values. For benign nodules, YOLOv11 obtained a P of 0.835 and R of 0.833, while for malignant nodules, it reached a P of 0.846 and a R of 0.813. Within the YOLOv11 model itself, performance varied across different IoU thresholds (0.25, 0.5, 0.7, and 0.9). Lower IoU thresholds generally resulted in better performance metrics, with P and R values decreasing as the IoU threshold increased.
Conclusions: YOLOv11 proved to be a powerful tool for thyroid nodule detection and malignancy classification, offering high P and real-time performance. These attributes are vital for dynamic ultrasound examinations and enhancing diagnostic efficiency. Future research will focus on expanding datasets and validating the model's clinical utility in real-time settings.
背景:甲状腺结节是一种常见的临床问题,准确诊断是有效治疗和改善患者预后的关键。传统的超声检查在很大程度上依赖于医生的经验,这可能导致诊断的可变性。将人工智能(AI)集成到医学成像中,为提高诊断准确性和效率提供了一个有前途的解决方案。本研究旨在评估You Only Look Once v. 11 (YOLOv11)模型通过超声图像检测和分类甲状腺结节的有效性,以支持实时临床决策和改进诊断工作流程。方法:我们使用YOLOv11模型对1,503张甲状腺超声图像数据集进行分析,分为训练集(1,203张)、验证集(150张)和测试集(150张),其中包括742个良性结节和778个恶性结节。采用先进的数据增强和迁移学习技术来优化模型性能。与其他YOLO变异(YOLOv3至YOLOv10)和剩余网络50 (ResNet50)进行比较分析,以评估其诊断能力。结果:与其他YOLO变体(从YOLOv3到YOLOv10)和ResNet50相比,YOLOv11模型在甲状腺结节检测方面表现出优越的性能。在联合(IoU)为0.5的交叉点上,YOLOv11的精度(P)为0.841,召回率(R)为0.823,优于ResNet50的P为0.8333和R为0.8025。在YOLO变体中,YOLOv11始终获得最高的P和R值。对于良性结节,YOLOv11的P值为0.835,R值为0.833;对于恶性结节,YOLOv11的P值为0.846,R值为0.813。在YOLOv11模型本身中,性能在不同的IoU阈值(0.25、0.5、0.7和0.9)上有所不同。较低的IoU阈值通常会产生更好的性能指标,随着IoU阈值的增加,P和R值会降低。结论:YOLOv11具有较高的P值和实时性,是甲状腺结节检测和恶性分类的有力工具。这些属性对于动态超声检查和提高诊断效率至关重要。未来的研究将集中在扩展数据集和验证模型在实时设置中的临床效用。