A Self-boosting Framework for Automated Radiographic Report Generation

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2021-06-01 DOI:10.1109/CVPR46437.2021.00246

Zhanyu Wang, Luping Zhou, Lei Wang, Xiu Li

{"title":"A Self-boosting Framework for Automated Radiographic Report Generation","authors":"Zhanyu Wang, Luping Zhou, Lei Wang, Xiu Li","doi":"10.1109/CVPR46437.2021.00246","DOIUrl":null,"url":null,"abstract":"Automated radiographic report generation is a challenging task since it requires to generate paragraphs describing fine-grained visual differences of cases, especially for those between the diseased and the healthy. Existing image captioning methods commonly target at generic images, and lack mechanism to meet this requirement. To bridge this gap, in this paper, we propose a self-boosting framework that improves radiographic report generation based on the cooperation of the main task of report generation and an auxiliary task of image-text matching. The two tasks are built as the two branches of a network model and influence each other in a cooperative way. On one hand, the image-text matching branch helps to learn highly text-correlated visual features for the report generation branch to output high quality reports. On the other hand, the improved reports produced by the report generation branch provide additional harder samples for the image-text matching branch and enforce the latter to improve itself by learning better visual and text feature representations. This, in turn, helps improve the report generation branch again. These two branches are jointly trained to help improve each other iteratively and progressively, so that the whole model is self-boosted without requiring external resources. Experimental results demonstrate the effectiveness of our method on two public datasets, showing its superior performance over multiple state-of-the-art image captioning and medical report generation methods.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR46437.2021.00246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Automated radiographic report generation is a challenging task since it requires to generate paragraphs describing fine-grained visual differences of cases, especially for those between the diseased and the healthy. Existing image captioning methods commonly target at generic images, and lack mechanism to meet this requirement. To bridge this gap, in this paper, we propose a self-boosting framework that improves radiographic report generation based on the cooperation of the main task of report generation and an auxiliary task of image-text matching. The two tasks are built as the two branches of a network model and influence each other in a cooperative way. On one hand, the image-text matching branch helps to learn highly text-correlated visual features for the report generation branch to output high quality reports. On the other hand, the improved reports produced by the report generation branch provide additional harder samples for the image-text matching branch and enforce the latter to improve itself by learning better visual and text feature representations. This, in turn, helps improve the report generation branch again. These two branches are jointly trained to help improve each other iteratively and progressively, so that the whole model is self-boosted without requiring external resources. Experimental results demonstrate the effectiveness of our method on two public datasets, showing its superior performance over multiple state-of-the-art image captioning and medical report generation methods.

查看原文本刊更多论文

自动生成放射报告的自我提升框架

自动生成放射报告是一项具有挑战性的任务，因为它需要生成描述病例的细粒度视觉差异的段落，特别是对于患病和健康之间的病例。现有的图像标注方法通常针对的是通用图像，缺乏满足这一要求的机制。为了弥补这一差距，本文提出了一种基于报告生成主任务和图像-文本匹配辅助任务合作的自促进框架，以改进射线成像报告生成。这两个任务被构建为网络模型的两个分支，并以协作的方式相互影响。一方面，图像-文本匹配分支有助于学习高度文本相关的视觉特征，供报表生成分支输出高质量的报表。另一方面，报告生成分支生成的改进报告为图像-文本匹配分支提供了额外的更难的样本，并强制后者通过学习更好的视觉和文本特征表示来改进自己。这反过来又有助于再次改进报告生成分支。这两个分支是联合训练的，互相迭代、递进地改进，使整个模型在不需要外部资源的情况下自我提升。实验结果证明了我们的方法在两个公共数据集上的有效性，显示了其优于多种最先进的图像字幕和医疗报告生成方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量