{"title":"Joint Lesion Detection and Classification of Breast Ultrasound Video via a Clinical Knowledge-Aware Framework","authors":"Minglei Li;Wushuang Gong;Pengfei Yan;Xiang Li;Yuchen Jiang;Hao Luo;Hang Zhou;Shen Yin","doi":"10.1109/TCSVT.2024.3452497","DOIUrl":null,"url":null,"abstract":"Ultrasound is an important routine screening modality for breast cancer. Breast ultrasound screening is a dynamic process, and clinical practice involves radiologists recording representative frames during dynamic breast scanning for subsequent diagnosis. However, existing computer-assisted diagnosis methods often concentrate on dull diagnostic results by analyzing these representative frames and ignore the valuable information in the dynamic examination process that facilitates diagnosis. Moreover, breast lesions could exhibit various characteristic differences during scanning, and effective learning of lesion representations is challenging and may affect the clinical interpretability of the methods. To this end, we draw insights from the behavior of radiologists during the dynamic breast examination and leverage the knowledge of breast anatomy to propose a clinical knowledge-aware framework for lesion detection and classification of breast lesions in ultrasound videos. It is equipped with global-local attentive aggregation and a dynamic allocation mechanism that simulates the behavior of radiologists searching for diagnostic clues, thus integrating local localization and global semantic information from the video into the feature representation of the lesion. An anatomically-aware transformer is also designed to refine the lesion feature representation using spatial relationships within and across different anatomical layers of the breast anatomy. Extensive experiments show that the proposed framework can achieve competitive performance in both lesion detection and video classification tasks while exhibiting good clinical availability and interpretability, with an average precision of 40.80% and an AUC of 85.86% on our constructed breast video dataset and an average precision of 39.79% and an AUC of 87.04% on a publicly available dataset.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"45-61"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10659844/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Ultrasound is an important routine screening modality for breast cancer. Breast ultrasound screening is a dynamic process, and clinical practice involves radiologists recording representative frames during dynamic breast scanning for subsequent diagnosis. However, existing computer-assisted diagnosis methods often concentrate on dull diagnostic results by analyzing these representative frames and ignore the valuable information in the dynamic examination process that facilitates diagnosis. Moreover, breast lesions could exhibit various characteristic differences during scanning, and effective learning of lesion representations is challenging and may affect the clinical interpretability of the methods. To this end, we draw insights from the behavior of radiologists during the dynamic breast examination and leverage the knowledge of breast anatomy to propose a clinical knowledge-aware framework for lesion detection and classification of breast lesions in ultrasound videos. It is equipped with global-local attentive aggregation and a dynamic allocation mechanism that simulates the behavior of radiologists searching for diagnostic clues, thus integrating local localization and global semantic information from the video into the feature representation of the lesion. An anatomically-aware transformer is also designed to refine the lesion feature representation using spatial relationships within and across different anatomical layers of the breast anatomy. Extensive experiments show that the proposed framework can achieve competitive performance in both lesion detection and video classification tasks while exhibiting good clinical availability and interpretability, with an average precision of 40.80% and an AUC of 85.86% on our constructed breast video dataset and an average precision of 39.79% and an AUC of 87.04% on a publicly available dataset.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.