Guanyuan Chen , Ningbo Zhu , Bin Pu , Guanghua Tan , Hongxia Luo , Kenli Li
{"title":"基于实例引导的帧视频信息融合网络用于甲状腺超声视频实例分割","authors":"Guanyuan Chen , Ningbo Zhu , Bin Pu , Guanghua Tan , Hongxia Luo , Kenli Li","doi":"10.1016/j.knosys.2025.113849","DOIUrl":null,"url":null,"abstract":"<div><div>Video instance segmentation (VIS) in thyroid ultrasound (US) images has the advantages of automatically segmenting nodules and assigning instance labels, which aids in counting nodules and determining the thyroid key plane. However, thyroid US videos are characterized by abundant background noise and echo similarity. Coupled with the continuous variations in the shape of tissue structures along the video stream, it becomes extremely challenging for current mainstream VIS methods to identify the temporal dynamic change features, limiting their performance in segmentation and tracking. To address this issue, this paper proposes a key instance-guided frame-to-video information fusion network (NoVIS) for thyroid US video instance segmentation. First, a frame-level instance information enhancement module is introduced to capture key instance information, constructing robust key instance features and enhancing similarity within the same instance. Second, a key instance-guided global information aggregation module is proposed. This module leverages key instance features to continuously update the global temporal information, aiming to capture the dynamic variation characteristics of structures along the video stream. Finally, a matching mechanism ensures consistency between training and inference, further improving tracking stability. Extensive experiments were conducted on a collected thyroid US video dataset, which demonstrated that NoVIS significantly outperforms other methods in both tracking and segmentation. Specifically, NoVIS achieves an AP of 58.09% and an AP<sub>50</sub> of 81.33%, which are 3.37% and 5.09% higher than those of the baseline method, respectively. Furthermore, the ability of the proposed method to determine the thyroid key plane and count nodules is demonstrated, which is highly valuable for fully automated thyroid clinical diagnosis.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"324 ","pages":"Article 113849"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A key instance-guided frame-to-video information fusion network for thyroid ultrasound video instance segmentation\",\"authors\":\"Guanyuan Chen , Ningbo Zhu , Bin Pu , Guanghua Tan , Hongxia Luo , Kenli Li\",\"doi\":\"10.1016/j.knosys.2025.113849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Video instance segmentation (VIS) in thyroid ultrasound (US) images has the advantages of automatically segmenting nodules and assigning instance labels, which aids in counting nodules and determining the thyroid key plane. However, thyroid US videos are characterized by abundant background noise and echo similarity. Coupled with the continuous variations in the shape of tissue structures along the video stream, it becomes extremely challenging for current mainstream VIS methods to identify the temporal dynamic change features, limiting their performance in segmentation and tracking. To address this issue, this paper proposes a key instance-guided frame-to-video information fusion network (NoVIS) for thyroid US video instance segmentation. First, a frame-level instance information enhancement module is introduced to capture key instance information, constructing robust key instance features and enhancing similarity within the same instance. Second, a key instance-guided global information aggregation module is proposed. This module leverages key instance features to continuously update the global temporal information, aiming to capture the dynamic variation characteristics of structures along the video stream. Finally, a matching mechanism ensures consistency between training and inference, further improving tracking stability. Extensive experiments were conducted on a collected thyroid US video dataset, which demonstrated that NoVIS significantly outperforms other methods in both tracking and segmentation. Specifically, NoVIS achieves an AP of 58.09% and an AP<sub>50</sub> of 81.33%, which are 3.37% and 5.09% higher than those of the baseline method, respectively. Furthermore, the ability of the proposed method to determine the thyroid key plane and count nodules is demonstrated, which is highly valuable for fully automated thyroid clinical diagnosis.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"324 \",\"pages\":\"Article 113849\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125008950\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125008950","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A key instance-guided frame-to-video information fusion network for thyroid ultrasound video instance segmentation
Video instance segmentation (VIS) in thyroid ultrasound (US) images has the advantages of automatically segmenting nodules and assigning instance labels, which aids in counting nodules and determining the thyroid key plane. However, thyroid US videos are characterized by abundant background noise and echo similarity. Coupled with the continuous variations in the shape of tissue structures along the video stream, it becomes extremely challenging for current mainstream VIS methods to identify the temporal dynamic change features, limiting their performance in segmentation and tracking. To address this issue, this paper proposes a key instance-guided frame-to-video information fusion network (NoVIS) for thyroid US video instance segmentation. First, a frame-level instance information enhancement module is introduced to capture key instance information, constructing robust key instance features and enhancing similarity within the same instance. Second, a key instance-guided global information aggregation module is proposed. This module leverages key instance features to continuously update the global temporal information, aiming to capture the dynamic variation characteristics of structures along the video stream. Finally, a matching mechanism ensures consistency between training and inference, further improving tracking stability. Extensive experiments were conducted on a collected thyroid US video dataset, which demonstrated that NoVIS significantly outperforms other methods in both tracking and segmentation. Specifically, NoVIS achieves an AP of 58.09% and an AP50 of 81.33%, which are 3.37% and 5.09% higher than those of the baseline method, respectively. Furthermore, the ability of the proposed method to determine the thyroid key plane and count nodules is demonstrated, which is highly valuable for fully automated thyroid clinical diagnosis.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.