使用知识蒸馏转换器从单一角度生成放射学报告

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL

Biomedical Signal Processing and Control Pub Date : 2025-07-16 DOI:10.1016/j.bspc.2025.108340

Asad Mansoor Khan , Mashood Mohammad Mohsan , Muhammad Usman Akram , Taimur Hassan , Sajid Gul Khawaja , Adil Qayyum

{"title":"使用知识蒸馏转换器从单一角度生成放射学报告","authors":"Asad Mansoor Khan , Mashood Mohammad Mohsan , Muhammad Usman Akram , Taimur Hassan , Sajid Gul Khawaja , Adil Qayyum","doi":"10.1016/j.bspc.2025.108340","DOIUrl":null,"url":null,"abstract":"<div><div>Nearly two billion chest X-rays (CXRs) are performed annually, making them the most used imaging technique in radiology for the diagnosis of pulmonary disorders. The accompanying report with the findings from a chest X-ray forms a crucial part of the examination. By providing an accurate report, healthcare professionals can be enabled to make better decisions about the care being provided. To this end, we propose an end-to-end radiology report generation framework built on transformers trained on text reports in conjunction with visual characteristics of the chest X-ray to generate a reliable report that astutely describes the findings from a single CXR taken either from the Anterior-Posterior or Posterior-Anterior position. A foundation model is utilised to perform Knowledge Distillation (KD) in conjunction with the Encoder which is fine-tuned during the training phase. In addition, using a large corpus of radiology reports to pre-train the foundation model in an unsupervised manner is shown to improve the performance on smaller datasets. This training methodology results in comparable performance to architectures that employ a lot more parameters. The proposed framework is evaluated on multiple datasets including the Indiana University dataset, MIMIC dataset, MIMIC-PRO dataset, and BRAX dataset. The incorporation of KD results in an increase of BLEU-1 score for Indiana dataset by 4% and BERTScore by 7.5%. Similarly, pre-training on larger datasets in combination with KD, further increases BLEU-1 score for Indiana dataset by 7.2% and BERTScore by 3%. For MIMIC dataset, comparable performance is achieved for the Findings and the Impression sections of the report while the proposed framework outperforms other techniques when both of these sections are combined. For MIMIC-PRO dataset, an s<span><math><msub><mrow></mrow><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msub></math></span> score of 0.4069 while a RadGraph F1 score of 0.1165 is achieved outperforming other techniques in the literature. Finally, the proposed framework is also evaluated on locally gathered dataset and BRAX subset without any re-training or fine-tuning resulting in BLEU-1 score of 0.3827 and a BERTScore of 0.4392 for the former and BLEU-1 score 0.1671 of and a BERTScore of 0.2186 for latter showing generalisation ability.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108340"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Radiology report generation from a singular perspective using transformers with Knowledge Distillation\",\"authors\":\"Asad Mansoor Khan , Mashood Mohammad Mohsan , Muhammad Usman Akram , Taimur Hassan , Sajid Gul Khawaja , Adil Qayyum\",\"doi\":\"10.1016/j.bspc.2025.108340\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Nearly two billion chest X-rays (CXRs) are performed annually, making them the most used imaging technique in radiology for the diagnosis of pulmonary disorders. The accompanying report with the findings from a chest X-ray forms a crucial part of the examination. By providing an accurate report, healthcare professionals can be enabled to make better decisions about the care being provided. To this end, we propose an end-to-end radiology report generation framework built on transformers trained on text reports in conjunction with visual characteristics of the chest X-ray to generate a reliable report that astutely describes the findings from a single CXR taken either from the Anterior-Posterior or Posterior-Anterior position. A foundation model is utilised to perform Knowledge Distillation (KD) in conjunction with the Encoder which is fine-tuned during the training phase. In addition, using a large corpus of radiology reports to pre-train the foundation model in an unsupervised manner is shown to improve the performance on smaller datasets. This training methodology results in comparable performance to architectures that employ a lot more parameters. The proposed framework is evaluated on multiple datasets including the Indiana University dataset, MIMIC dataset, MIMIC-PRO dataset, and BRAX dataset. The incorporation of KD results in an increase of BLEU-1 score for Indiana dataset by 4% and BERTScore by 7.5%. Similarly, pre-training on larger datasets in combination with KD, further increases BLEU-1 score for Indiana dataset by 7.2% and BERTScore by 3%. For MIMIC dataset, comparable performance is achieved for the Findings and the Impression sections of the report while the proposed framework outperforms other techniques when both of these sections are combined. For MIMIC-PRO dataset, an s<span><math><msub><mrow></mrow><mrow><mi>e</mi><mi>m</mi><mi>b</mi></mrow></msub></math></span> score of 0.4069 while a RadGraph F1 score of 0.1165 is achieved outperforming other techniques in the literature. Finally, the proposed framework is also evaluated on locally gathered dataset and BRAX subset without any re-training or fine-tuning resulting in BLEU-1 score of 0.3827 and a BERTScore of 0.4392 for the former and BLEU-1 score 0.1671 of and a BERTScore of 0.2186 for latter showing generalisation ability.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"111 \",\"pages\":\"Article 108340\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425008511\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425008511","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

每年进行近20亿次胸部x光检查（cxr），使其成为放射学中用于诊断肺部疾病的最常用成像技术。随附的胸部x光检查报告是检查的关键部分。通过提供准确的报告，医疗保健专业人员可以对所提供的护理做出更好的决策。为此，我们提出了一个端到端的放射学报告生成框架，该框架建立在文本报告训练的转换器上，并结合胸部x光片的视觉特征，生成可靠的报告，准确描述从前后位或前后位拍摄的单个CXR的结果。一个基础模型被用来执行知识蒸馏（KD）与编码器，这是在训练阶段微调。此外，使用大型放射学报告语料库以无监督的方式预训练基础模型可以提高较小数据集的性能。这种训练方法的结果与使用更多参数的体系结构的性能相当。该框架在多个数据集上进行了评估，包括印第安纳大学数据集、MIMIC数据集、MIMIC- pro数据集和BRAX数据集。纳入KD后，印第安纳数据集的blue -1得分提高了4%，BERTScore提高了7.5%。同样，结合KD对更大的数据集进行预训练，进一步将印第安纳数据集的BLEU-1分数提高7.2%，将BERTScore提高3%。对于MIMIC数据集，报告的调查结果和印象部分实现了可比较的性能，而当这两个部分结合在一起时，建议的框架优于其他技术。对于MIMIC-PRO数据集，semb得分为0.4069，RadGraph F1得分为0.1165，优于文献中的其他技术。最后，在没有任何重新训练或微调的情况下，在本地收集的数据集和BRAX子集上对所提出的框架进行了评估，前者的BLEU-1得分为0.3827，BERTScore为0.4392，后者的BLEU-1得分为0.1671，BERTScore为0.2186，显示出泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Radiology report generation from a singular perspective using transformers with Knowledge Distillation

Nearly two billion chest X-rays (CXRs) are performed annually, making them the most used imaging technique in radiology for the diagnosis of pulmonary disorders. The accompanying report with the findings from a chest X-ray forms a crucial part of the examination. By providing an accurate report, healthcare professionals can be enabled to make better decisions about the care being provided. To this end, we propose an end-to-end radiology report generation framework built on transformers trained on text reports in conjunction with visual characteristics of the chest X-ray to generate a reliable report that astutely describes the findings from a single CXR taken either from the Anterior-Posterior or Posterior-Anterior position. A foundation model is utilised to perform Knowledge Distillation (KD) in conjunction with the Encoder which is fine-tuned during the training phase. In addition, using a large corpus of radiology reports to pre-train the foundation model in an unsupervised manner is shown to improve the performance on smaller datasets. This training methodology results in comparable performance to architectures that employ a lot more parameters. The proposed framework is evaluated on multiple datasets including the Indiana University dataset, MIMIC dataset, MIMIC-PRO dataset, and BRAX dataset. The incorporation of KD results in an increase of BLEU-1 score for Indiana dataset by 4% and BERTScore by 7.5%. Similarly, pre-training on larger datasets in combination with KD, further increases BLEU-1 score for Indiana dataset by 7.2% and BERTScore by 3%. For MIMIC dataset, comparable performance is achieved for the Findings and the Impression sections of the report while the proposed framework outperforms other techniques when both of these sections are combined. For MIMIC-PRO dataset, an s

_{e m b}

score of 0.4069 while a RadGraph F1 score of 0.1165 is achieved outperforming other techniques in the literature. Finally, the proposed framework is also evaluated on locally gathered dataset and BRAX subset without any re-training or fine-tuning resulting in BLEU-1 score of 0.3827 and a BERTScore of 0.4392 for the former and BLEU-1 score 0.1671 of and a BERTScore of 0.2186 for latter showing generalisation ability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical Signal Processing and Control 工程技术-工程：生物医学

CiteScore

9.80

自引率

13.70%

发文量

822

审稿时长

4 months

期刊介绍： Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management. Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.