通过整合 VoFoCD 数据集中的全局信息和局部特征改进喉镜图像分析

Journal of imaging informatics in medicine Pub Date : 2024-12-01 Epub Date: 2024-05-29 DOI:10.1007/s10278-024-01068-z

Thao Thi Phuong Dao, Tuan-Luc Huynh, Minh-Khoi Pham, Trung-Nghia Le, Tan-Cong Nguyen, Quang-Thuc Nguyen, Bich Anh Tran, Boi Ngoc Van, Chanh Cong Ha, Minh-Triet Tran

{"title":"通过整合 VoFoCD 数据集中的全局信息和局部特征改进喉镜图像分析","authors":"Thao Thi Phuong Dao, Tuan-Luc Huynh, Minh-Khoi Pham, Trung-Nghia Le, Tan-Cong Nguyen, Quang-Thuc Nguyen, Bich Anh Tran, Boi Ngoc Van, Chanh Cong Ha, Minh-Triet Tran","doi":"10.1007/s10278-024-01068-z","DOIUrl":null,"url":null,"abstract":"The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":"2794-2809"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11612113/pdf/","citationCount":"0","resultStr":"{\"title\":\"Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset.\",\"authors\":\"Thao Thi Phuong Dao, Tuan-Luc Huynh, Minh-Khoi Pham, Trung-Nghia Le, Tan-Cong Nguyen, Quang-Thuc Nguyen, Bich Anh Tran, Boi Ngoc Van, Chanh Cong Ha, Minh-Triet Tran\",\"doi\":\"10.1007/s10278-024-01068-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.\",\"PeriodicalId\":516858,\"journal\":{\"name\":\"Journal of imaging informatics in medicine\",\"volume\":\" \",\"pages\":\"2794-2809\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11612113/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of imaging informatics in medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10278-024-01068-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/5/29 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-024-01068-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/29 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

声带疾病的诊断和治疗在很大程度上依赖于喉镜的使用。全面的声带诊断需要在喉镜观察过程中准确识别关键的解剖结构和潜在的病变。然而，现有方法尚未探索决策过程的联合优化，包括同时进行物体检测和图像分类任务。在本研究中，我们提供了一个新的数据集 VoFoCD，该数据集包含 1724 张喉科图像，专门用于喉镜图像中的物体检测和图像分类。VoFoCD 数据集中的图像被分为四类，包括六种声门物体类型。此外，我们还提出了一种新颖的喉镜多任务高效网络（MEAL）来对声带图像进行分类，并检测声门地标和病变。为了进一步提高临床医生的可解释性，MEAL 提供了注意力图，将重要的学习区域可视化，以解释可支持临床决策的人工智能结果。我们还分析了模型在模拟临床场景中的有效性，在这些场景中喉镜检查过程会发生晃动。所提出的模型在 VoFoCD 数据集上表现出色。图像分类的准确度和物体检测的 0.5 联合阈值（mAP50）交叉点平均精度分别为 0.951 和 0.874。我们的 MEAL 方法将包含一般喉镜图像分类的全局知识整合到局部特征中，局部特征指的是声带的不同解剖区域，尤其是异常区域，包括良性和恶性病变。我们的贡献可有效帮助喉科专家识别声带的良性或恶性病变，并在喉内窥镜检查过程中直观地对图像进行分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset.

查看原文本刊更多论文

Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset.

The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量